Re: [devel] [PATCH 2/2] ckptnd: fix crash during checkpoint open timeout with large sections [#1510]

2017-10-23 Thread Vo Minh Hoang
Hi Alex,

ACK from me.
Tested with basic behavior.

Sincerely,
Hoang

-Original Message-
From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Monday, October 23, 2017 10:13 PM
To: hoang.m...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones 
Subject: [PATCH 2/2] ckptnd: fix crash during checkpoint open timeout with
large sections [#1510]

v2
---
 src/ckpt/ckptnd/cpnd_evt.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/src/ckpt/ckptnd/cpnd_evt.c b/src/ckpt/ckptnd/cpnd_evt.c index
a968f34..90b4e6c 100644
--- a/src/ckpt/ckptnd/cpnd_evt.c
+++ b/src/ckpt/ckptnd/cpnd_evt.c
@@ -702,7 +702,6 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
CPSV_EVT send_evt, *out_evt = NULL;
SaConstStringT ckpt_name = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
-  bool node_added = false;
CPND_CPD_DEFERRED_REQ_NODE *node = NULL;
CPND_CKPT_CLIENT_NODE *cl_node = NULL;
CPND_CKPT_NODE *cp_node = NULL;
@@ -1027,8 +1026,6 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
goto ckpt_shm_node_free_error;
}
 
-node_added = true;
-
if (out_evt->info.cpnd.info.ckpt_info.ckpt_rep_create ==
true &&
cp_node->create_attrib.maxSections == 1) {
 
@@ -1038,7 +1035,7 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
TRACE_4(
"cpnd ckpt rep create failed with
rc:%d",
rc);
-   goto ckpt_shm_node_free_error;
+   goto ckpt_node_del_error;
}
}
cpnd_evt_destroy(out_evt);
@@ -1101,7 +1098,7 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
(out_evt->info.cpnd.error != SA_AIS_OK))
{
send_evt.info.cpa.info.openRsp.error
=
out_evt->info.cpnd.error;
-   goto ckpt_shm_node_free_error;
+   goto ckpt_node_del_error;
} else if ((out_evt) &&
   (out_evt->info.cpnd.error ==
SA_AIS_OK) &&
@@ -1173,6 +1170,11 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
TRACE_4("cpnd ckpt open failure client_hdl:%llx", client_hdl);
goto agent_rsp;
 
+ckpt_node_del_error:
+   rc = cpnd_ckpt_node_del(cb, cp_node);
+   if (rc == NCSCC_RC_FAILURE)
+   LOG_ER("cpnd client tree del failed");
+
 ckpt_shm_node_free_error:
cpnd_ckpt_replica_destroy(cb, cp_node, );
 
@@ -1204,12 +1206,6 @@ ckpt_node_free_error:
cpnd_tmr_stop(_node->ret_tmr);
cpnd_ckpt_sec_map_destroy(_node->replica_info);
 
-  if (node_added) {
-rc = cpnd_ckpt_node_del(cb, cp_node);
-if (rc == NCSCC_RC_FAILURE)
-  LOG_ER("cpnd client tree del failed");
-  }
-
m_MMGR_FREE_CPND_CKPT_NODE(cp_node);
 
 agent_rsp:
--
2.9.5



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ckptnd: fix crash during checkpoint open timeout with large sections [#1510]

2017-10-22 Thread Vo Minh Hoang
Hi Alex,

 

Sorry I got confuse so the comment is not clear, and maybe is wrong.

I try to rewrite it.

 

- Do we really need to add a flag variable for tracking node_added? If we 
implement cpnd_ckpt_node_del() before ckpt_shm_node_free_error and after 
agent_rsp2 with new goto mark that might seem clearer. I have a concern of 
adding flag(s) makes code harder to read.

 

Another minor point, this file uses tab indent, your code uses 2 space indent, 
please make it consistent.

 

Sincerely,

Hoang

 

From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Friday, October 20, 2017 8:41 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ckptnd: fix crash during checkpoint open timeout with 
large sections [#1510]

 

Hi Hoang,

I'm not sure what you are asking. Are you saying that all of the code under 
the ckpt_node_free_error label should be moved to the agent_rsp2 label? I don't 
understand why. Can you explain this in more detail?

What we see is that out_evt->info.cpnd.error does return TRY_AGAIN, but 
there is a "goto ckpt_shm_node_free_error" right after that (line 1102 in the 
patch). Without changing this goto statement, then all the code that was moved 
to agent_rsp2 (per your suggestion) would get skipped, and nothing would be 
removed.

Alex

 

On 10/20/2017 02:28 AM, Vo Minh Hoang wrote:

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Dear Alex,

I got confuse with this path as following:
>> sync from the active can timeout with errorcode SA_AIS_ERR_TRY_AGAIN
Does that mean out_evt->info.cpnd.error == SA_AIS_ERR_TRY_AGAIN?

Your current cpnd_ckpt_node_del() is added in ckpt_node_free_error.
If above is true, please consider moving ckpt_node_free_error() to
agent_rsp2 part, node_added flag might not need.

Sincerely,
Hoang

-Original Message-
From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Tuesday, October 17, 2017 9:20 PM
To: Hoang Vo  <mailto:hoang.m...@dektech.com.au> <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> ; Alex Jones  
<mailto:alex.jo...@genband.com> <alex.jo...@genband.com>
Subject: [PATCH 1/1] ckptnd: fix crash during checkpoint open timeout with
large sections [#1510]

ckptnd crashes

When opening a collocated checkpoint replica where the active has large
numbers of sections (~200k), the sync from the active can timeout with
errorcode SA_AIS_ERR_TRY_AGAIN. In this case the code deletes the memory for
the node, but does not delete the node from the db. When the checkpoint
access is tried again, the freed memory for the node is still in the db, and
ckptnd crashes.

Delete the node from the db if the node is deleted during the open.
---
src/ckpt/ckptnd/cpnd_evt.c | 10 ++
1 file changed, 10 insertions(+)

diff --git a/src/ckpt/ckptnd/cpnd_evt.c b/src/ckpt/ckptnd/cpnd_evt.c index
2070163..a968f34 100644
--- a/src/ckpt/ckptnd/cpnd_evt.c
+++ b/src/ckpt/ckptnd/cpnd_evt.c
@@ -702,6 +702,7 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
CPSV_EVT send_evt, *out_evt = NULL;
SaConstStringT ckpt_name = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
+ bool node_added = false;
CPND_CPD_DEFERRED_REQ_NODE *node = NULL;
CPND_CKPT_CLIENT_NODE *cl_node = NULL;
CPND_CKPT_NODE *cp_node = NULL;
@@ -1026,6 +1027,8 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
goto ckpt_shm_node_free_error;
}

+ node_added = true;
+
if (out_evt->info.cpnd.info.ckpt_info.ckpt_rep_create ==
true &&
cp_node->create_attrib.maxSections == 1) {

@@ -1200,6 +1203,13 @@ ckpt_node_free_error:
if (cp_node->ret_tmr.is_active)
cpnd_tmr_stop(_node->ret_tmr);
cpnd_ckpt_sec_map_destroy(_node->replica_info);
+
+ if (node_added) {
+ rc = cpnd_ckpt_node_del(cb, cp_node);
+ if (rc == NCSCC_RC_FAILURE)
+ LOG_ER("cpnd client tree del failed"); }
+
m_MMGR_FREE_CPND_CKPT_NODE(cp_node);

agent_rsp:
--
2.9.5

 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ckptnd: fix crash during checkpoint open timeout with large sections [#1510]

2017-10-20 Thread Vo Minh Hoang
Dear Alex,

I got confuse with this path as following:
>> sync from the active can timeout with errorcode SA_AIS_ERR_TRY_AGAIN
Does that mean out_evt->info.cpnd.error == SA_AIS_ERR_TRY_AGAIN?

Your current cpnd_ckpt_node_del() is added in ckpt_node_free_error.
If above is true, please consider moving ckpt_node_free_error() to
agent_rsp2 part, node_added flag might not need.

Sincerely,
Hoang

-Original Message-
From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Tuesday, October 17, 2017 9:20 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones 
Subject: [PATCH 1/1] ckptnd: fix crash during checkpoint open timeout with
large sections [#1510]

ckptnd crashes

When opening a collocated checkpoint replica where the active has large
numbers of sections (~200k), the sync from the active can timeout with
errorcode SA_AIS_ERR_TRY_AGAIN. In this case the code deletes the memory for
the node, but does not delete the node from the db. When the checkpoint
access is tried again, the freed memory for the node is still in the db, and
ckptnd crashes.

Delete the node from the db if the node is deleted during the open.
---
 src/ckpt/ckptnd/cpnd_evt.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/ckpt/ckptnd/cpnd_evt.c b/src/ckpt/ckptnd/cpnd_evt.c index
2070163..a968f34 100644
--- a/src/ckpt/ckptnd/cpnd_evt.c
+++ b/src/ckpt/ckptnd/cpnd_evt.c
@@ -702,6 +702,7 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
CPSV_EVT send_evt, *out_evt = NULL;
SaConstStringT ckpt_name = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
+  bool node_added = false;
CPND_CPD_DEFERRED_REQ_NODE *node = NULL;
CPND_CKPT_CLIENT_NODE *cl_node = NULL;
CPND_CKPT_NODE *cp_node = NULL;
@@ -1026,6 +1027,8 @@ static uint32_t cpnd_evt_proc_ckpt_open(CPND_CB *cb,
CPND_EVT *evt,
goto ckpt_shm_node_free_error;
}
 
+node_added = true;
+
if (out_evt->info.cpnd.info.ckpt_info.ckpt_rep_create ==
true &&
cp_node->create_attrib.maxSections == 1) {
 
@@ -1200,6 +1203,13 @@ ckpt_node_free_error:
if (cp_node->ret_tmr.is_active)
cpnd_tmr_stop(_node->ret_tmr);
cpnd_ckpt_sec_map_destroy(_node->replica_info);
+
+  if (node_added) {
+rc = cpnd_ckpt_node_del(cb, cp_node);
+if (rc == NCSCC_RC_FAILURE)
+  LOG_ER("cpnd client tree del failed");  }
+
m_MMGR_FREE_CPND_CKPT_NODE(cp_node);
 
 agent_rsp:
--
2.9.5



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for ckpt: add timeout handling for test_ckptOverwrite [#2624]

2017-10-16 Thread Vo Minh Hoang
Dear Alex,

If there is no further comment, would you please help me push the patch.
Because I do not have the push privilege.

Thank you very much for your kindness.
Sincerely,
Hoang

-Original Message-
From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au] 
Sent: Thursday, October 12, 2017 6:26 PM
To: 'Alex Jones' <alex.jo...@genband.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0/1] Review Request for ckpt: add timeout
handling for test_ckptOverwrite [#2624]

Dear Alex,

 

Thank you very much for your very fast response.

 

Because this is test code, I think we should push it to release also, test
environment should be consistent even it is really not important.

If not, release branch might fail sometimes then cost extra effort for
verifying much more important code.

 

Sincerely,

Hoang

 

From: Alex Jones [mailto:alex.jo...@genband.com]
Sent: Thursday, October 12, 2017 6:01 PM
To: Hoang Vo <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for ckpt: add timeout handling for
test_ckptOverwrite [#2624]

 

Hi Hoang,

 

  Ack from me. Is there a reason this needs to go on the release branch?
It's just test code.

 

Alex

  _  

From: Hoang Vo <hoang.m...@dektech.com.au <mailto:hoang.m...@dektech.com.au>
>
Sent: Thursday, October 12, 2017 6:13:44 AM
To: Alex Jones
Cc: opensaf-devel@lists.sourceforge.net
<mailto:opensaf-devel@lists.sourceforge.net> ; Hoang Vo
Subject: [PATCH 0/1] Review Request for ckpt: add timeout handling for
test_ckptOverwrite [#2624] 

 

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Summary: ckpt: add timeout handling for test_ckptOverwrite [#2624] Review
request for Ticket(s): 2624 Peer Reviewer(s): ajo...@genband.com
<mailto:ajo...@genband.com> Pull request to: ajo...@genband.com
<mailto:ajo...@genband.com> Affected branch(es): develop, release
Development branch: ticket-2624 Base revision:
4fbc3261d53914242bdbc5c300caecc53b88365a
Personal repository: git://git.code.sf.net/u/swgerai/review


Impacted area Impact y/n

Docs n
Build system n
RPM/packaging n
Configuration files n
Startup scripts n
SAF services n
OpenSAF services n
Core libraries n
Samples n
Tests y
Other n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 38ebcac386e4cfda5c0b17feabaa609c526651d9
Author: Hoang Vo <hoang.m...@dektech.com.au
<mailto:hoang.m...@dektech.com.au> >
Date: Thu, 12 Oct 2017 16:53:24 +0700

ckpt: add timeout handling for test_ckptOverwrite [#2624]

test_ckptOverwrite verify overwrite behavior and should handle
SA_AIS_ERR_TIMEOUT by retrying operation.



Complete diffstat:
--
src/ckpt/apitest/test_cpa_util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


Testing Commands:
-
ckpttest 20 1

Testing, Expected Results:
--
test case pass even in delay network situation

Conditions of Submission:
-
ACK from maintainer

Arch Built Started Linux distro
---
mips n n
mips64 n n
x86 n n
x86_64 y y
powerpc n n
powerpc64 n n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files (i.e.
internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes like
trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other cosmetic
code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is too
much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent; Instead
you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or plac

Re: [devel] [PATCH 0/1] Review Request for ckpt: add timeout handling for test_ckptOverwrite [#2624]

2017-10-12 Thread Vo Minh Hoang
Dear Alex,

 

Thank you very much for your very fast response.

 

Because this is test code, I think we should push it to release also, test
environment should be consistent even it is really not important.

If not, release branch might fail sometimes then cost extra effort for
verifying much more important code.

 

Sincerely,

Hoang

 

From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Thursday, October 12, 2017 6:01 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for ckpt: add timeout handling for
test_ckptOverwrite [#2624]

 

Hi Hoang,

 

  Ack from me. Is there a reason this needs to go on the release branch?
It's just test code.

 

Alex

  _  

From: Hoang Vo 
>
Sent: Thursday, October 12, 2017 6:13:44 AM
To: Alex Jones
Cc: opensaf-devel@lists.sourceforge.net
 ; Hoang Vo
Subject: [PATCH 0/1] Review Request for ckpt: add timeout handling for
test_ckptOverwrite [#2624] 

 

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Summary: ckpt: add timeout handling for test_ckptOverwrite [#2624]
Review request for Ticket(s): 2624
Peer Reviewer(s): ajo...@genband.com  
Pull request to: ajo...@genband.com  
Affected branch(es): develop, release
Development branch: ticket-2624
Base revision: 4fbc3261d53914242bdbc5c300caecc53b88365a
Personal repository: git://git.code.sf.net/u/swgerai/review


Impacted area Impact y/n

Docs n
Build system n
RPM/packaging n
Configuration files n
Startup scripts n
SAF services n
OpenSAF services n
Core libraries n
Samples n
Tests y
Other n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 38ebcac386e4cfda5c0b17feabaa609c526651d9
Author: Hoang Vo  >
Date: Thu, 12 Oct 2017 16:53:24 +0700

ckpt: add timeout handling for test_ckptOverwrite [#2624]

test_ckptOverwrite verify overwrite behavior and should
handle SA_AIS_ERR_TIMEOUT by retrying operation.



Complete diffstat:
--
src/ckpt/apitest/test_cpa_util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


Testing Commands:
-
ckpttest 20 1

Testing, Expected Results:
--
test case pass even in delay network situation

Conditions of Submission:
-
ACK from maintainer

Arch Built Started Linux distro
---
mips n n
mips64 n n
x86 n n
x86_64 y y
powerpc n n
powerpc64 n n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email
etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service 

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-14 Thread Vo Minh Hoang
Dear Mahesh,

Thank you for your comments.
I add 2 of my ideals inline, please find [Hoang] tags.

Dear Zoran,

Do you have any extra comment about this patch?
If not, I will request pushing it at start of next week.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, April 13, 2017 5:47 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Ramesh Babu Betham
<ramesh.bet...@oracle.com>
Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
[#1765] V5

Hi Hoang,

ACK with following : ( tested basic ND restarts)

- The below errors are not related this patch, those are test case related

- It look their a existing issue ( not related to this patch ) on Cpnd down
the  STANDBY Cpd is
   also starting `cpd_tmr_start(_info->cpnd_ret_timer,..);`  please
check that flow once
   (after cpnd restart keep some sleep Actvie CPD and do a switch over )
[Hoang]: I also can reproduce this behavior but could not find error.
So I will continue checking it in separate ticket.
It is a little bit weird that standby cpd trigger something. Honestly I
think standby should do data sync only. Btw, that is too soon to talk about
this case.

-  You introduced cpd_tmr_stop(_info->cpnd_ret_timer);  in
cpnd_down_process()
but cpnd_up_process()  do call
`cpd_tmr_stop(_info->cpnd_ret_timer);`
do check that it may be redundant call .
[Hoang]: I thought we should keep this call even it is redundant in this
case. We are detecting more and more unexpected error cases in system and
cannot tell for sure it is redundant or not.

-AVM

On 4/12/2017 2:19 PM, A V Mahesh wrote:
> Hi Hoang,
>
> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
>> If cpnd is temporary down only, we don't need clean up anything.
>> If cpnd is permanently down, the bad effect of this proposal is that 
>> replica is not clean up. But if cpnd permanently down, we have to 
>> reboot node for recovering so I think this cleanup is not really 
>> necessary.
>>
>> I also checked this implementation with possible test cases and have 
>> not seen any side effect.
>> Please consider it
> We are observing new node_user_info  databases mismatch Errors, while 
> testing multiple CPND restart with this patch,I will do more debugging 
> and update the root cause.
>
> ==
> =
>
>
> Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start 
> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 *Apr 12 
> 14:06:58 SC-1 osafckptd[27594]: ER cpd_proc_decrease_node_user_info 
> failed - no user on node id 0x2020F* Apr 12 14:06:58 SC-1 
> osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION timer id 
> = 0x7f86f0501750, arg=0x7f86f0501ef0 *Apr 12 14:06:59 SC-1 
> osafckptd[27594]: ER cpd_proc_decrease_node_user_info failed - no user 
> on node id 0x2020F* Apr 12 14:06:59 SC-1 osafckptd[27594]: NO 
> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0503ab0, 
> arg=0x7f86f0501ef0 Apr 12 14:07:00 SC-1 osafckptd[27594]: NO 
> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500c70, 
> arg=0x7f86f0501ef0 Apr 12 14:07:01 SC-1 osafckptd[27594]: NO 
> cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500930, 
> arg=0x7f86f0501ef0 *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER 
> cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F 
> Apr 12 14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start 
> CPND_RETENTION timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0 Apr 12 
> 14:07:04 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start 
> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0
>
> ==========
> =
>
>
> -AVM
>
>
> On 4/12/2017 11:08 AM, A V Mahesh wrote:
>> Hi Hoang,
>>
>> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
>>> Dear Mahesh,
>>>
>>> Based on what I saw, in this case, retention time cannot detect CPND 
>>> temporarily down because its pid changed.
>> I will check that , I have some test cases based this retention time 
>> , not sure how were they working.
>>
>> Can you please provide reproducible steps, I did look at ticket , but 
>> looks complex , if you have any application that reproduces the case 
>> please share.
>>
>> -AVM
>>>
>>> If cpnd is temporary down only, we don't need clean up anything.
>>> If cpnd is permanently down, the bad effect of this proposal is that 
>>> replica is not clean up. But 

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-12 Thread Vo Minh Hoang
Dear Mahesh,

Sorry when it takes time to recall some long lost information.

Bellowing is the reproduce steps in newest source code:
- create some non-collocated checkpoints in SC-1
- make failover occur by pkill -9 amfd
- do that again 4 time with active SC
- check /run/shm found that all replicas gone
- create same name checkpoint again and got SA_AIS_ERR_LIBRARY

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, April 12, 2017 3:50 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Ramesh Babu Betham
<ramesh.bet...@oracle.com>
Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
[#1765] V5

Hi Hoang,

On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
> If cpnd is temporary down only, we don't need clean up anything.
> If cpnd is permanently down, the bad effect of this proposal is that 
> replica is not clean up. But if cpnd permanently down, we have to 
> reboot node for recovering so I think this cleanup is not really
necessary.
>
> I also checked this implementation with possible test cases and have 
> not seen any side effect.
> Please consider it
We are observing new node_user_info  databases mismatch Errors, while
testing multiple CPND restart with this patch,I will do more debugging and
update the root cause.


===

Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 *Apr 12
14:06:58 SC-1 osafckptd[27594]: ER cpd_proc_decrease_node_user_info failed -
no user on node id 0x2020F* Apr 12 14:06:58 SC-1 osafckptd[27594]: NO
cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0501750,
arg=0x7f86f0501ef0 *Apr 12 14:06:59 SC-1 osafckptd[27594]: ER
cpd_proc_decrease_node_user_info failed - no user on node id 0x2020F* Apr 12
14:06:59 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION
timer id = 0x7f86f0503ab0, arg=0x7f86f0501ef0 Apr 12 14:07:00 SC-1
osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION timer id =
0x7f86f0500c70, arg=0x7f86f0501ef0 Apr 12 14:07:01 SC-1 osafckptd[27594]: NO
cpnd_down_process:: Start CPND_RETENTION timer id = 0x7f86f0500930,
arg=0x7f86f0501ef0 *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER
cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F Apr 12
14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION
timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0 Apr 12 14:07:04 SC-1
osafckptd[27594]: NO cpnd_down_process:: Start CPND_RETENTION timer id =
0x7f86f0500cf0, arg=0x7f86f0501ef0


===

-AVM


On 4/12/2017 11:08 AM, A V Mahesh wrote:
> Hi Hoang,
>
> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Based on what I saw, in this case, retention time cannot detect CPND 
>> temporarily down because its pid changed.
> I will check that , I have some test cases based this retention time , 
> not sure how were they working.
>
> Can you please provide reproducible steps, I did look at ticket , but 
> looks complex , if you have any application that reproduces the case 
> please share.
>
> -AVM
>>
>> If cpnd is temporary down only, we don't need clean up anything.
>> If cpnd is permanently down, the bad effect of this proposal is that 
>> replica is not clean up. But if cpnd permanently down, we have to 
>> reboot node for recovering so I think this cleanup is not really 
>> necessary.
>>
>> I also checked this implementation with possible test cases and have 
>> not seen any side effect.
>> Please consider it.
>>
>> Thank you and best regards,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Friday, February 10, 2017 10:40 AM
>> To: Hoang Vo <hoang.m...@dektech.com.au>; 
>> zoran.milinko...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv 
>> [#1765] V5
>>
>> Hi Hoang,
>>
>> The CPD_CPND_DOWN_RETENTION  is to recognize, ether CPND temporarily 
>> down or permanently down, this is started a CPND is down and based on 
>> cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete 
>> down and do cleanup, else  cpnd rejoined with in 
>> CPD_CPND_DOWN_RETENTION_TIME , the CPD_CPND_DOWN_RETENTION is stoped.
>>
>> If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(), 
>> do cpd recognize the CPD permanently do

Re: [devel] [PATCH 1 of 1] cpd: update missed out node_users_cnt on standby [#2337]

2017-03-09 Thread Vo Minh Hoang
Hi Mahesh,

ACK. Review only.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, March 10, 2017 12:47 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] cpd: update missed out node_users_cnt on standby
[#2337]

 src/ckpt/ckptd/cpd_sbevt.c |  1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


Fixed the missed-out/introduced issue  of  ticket #1669 patch node_users_cnt
variable

diff --git a/src/ckpt/ckptd/cpd_sbevt.c b/src/ckpt/ckptd/cpd_sbevt.c
--- a/src/ckpt/ckptd/cpd_sbevt.c
+++ b/src/ckpt/ckptd/cpd_sbevt.c
@@ -586,6 +586,7 @@ uint32_t cpd_sb_proc_ckpt_usrinfo(CPD_CB
ckpt_node->num_sections = msg->info.usr_info_2.num_sections;
ckpt_node->ckpt_on_scxb1 = msg->info.usr_info_2.ckpt_on_scxb1;
ckpt_node->ckpt_on_scxb2 = msg->info.usr_info_2.ckpt_on_scxb2;
+   ckpt_node->node_users_cnt = msg->info.usr_info_2.node_users_cnt;
 
/* Free the old node_users */
CPD_NODE_USER_INFO *node_user = ckpt_node->node_users;


--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 2] mds: handle memory leak [#1860]

2017-03-08 Thread Vo Minh Hoang
Dear Anders Widell,

 

Yes, as you mentioned, this patch offer a change for this behavior.

 

The reason is as bellowing:

*   Currently, when API succeeds, memory is free in user defined
callback, not really by MDS itself. Only in some cases API fails, MDS frees
memory, in some cases MDS does not. So source code is not strictly same as
PR document.
*   MDS do not have method to check user init memory by using the macro
m_MDS_ALLOC_DIRECT_BUFF(x) so this behavior is non-defensive coding.
*   Personally, I think that memory should be init/free in the same
level except specific reason, I cannot find reason in this case.

 

Thank you and best regards,

Hoang

 

From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Wednesday, March 8, 2017 9:49 PM
To: Hoang Vo ; mahesh.va...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 2] mds: handle memory leak [#1860]

 

Hi!

I am trying to understand how these MDS APIs are intended to work. When I
read Section 3.2.9.1.7 in OpenSAF_MDSv_PR.odt in the opensaf-internal-docs
Mercurial repository, it looks like it is the responsibility of the MDS
library to free the provided buffer that was passed in to MDS_DIRECT_SEND,
no matter if the MDS API call succeeds or fails. Doesn't this patch change
this behaviour in MDS?

 

regards,

Anders Widell

 

On 03/06/2017 09:00 AM, Hoang Vo wrote:

 src/base/sysf_mem.c|   3 +++
 src/mds/mds_c_sndrcv.c |  34 +++---
 2 files changed, 18 insertions(+), 19 deletions(-)
 
 
Some error handling does not clean internal memory.
Error handling in dirrect send case clear user memory seem inconsistence,
mds should let creater manage its memory in error cases.
 
action: implement as proposed.
 
diff --git a/src/base/sysf_mem.c b/src/base/sysf_mem.c
--- a/src/base/sysf_mem.c
+++ b/src/base/sysf_mem.c
@@ -1,6 +1,7 @@
 /*  -*- OpenSAF  -*-
  *
  * (C) Copyright 2008 The OpenSAF Foundation
+ * Copyright Ericsson AB 2017 - All Rights Reserved.
  *
  * This program is distributed in the hope that it will be useful, but
  * WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY
@@ -428,6 +429,8 @@ USRBUF *sysf_alloc_pkt(unsigned char poo
 if (pool_id >= UB_MAX_POOLS) {
 m_PMGR_UNLK(_ub_pool_mgr.lock);
 m_LEAP_DBG_SINK_VOID;
+m_NCS_MEM_FREE(ub, NCS_MEM_REGION_IO_DATA_HDR,
NCS_SERVICE_ID_OS_SVCS, 2);
+ub = (USRBUF *)0;
 return NULL;
 }
 ud = (USRDATA
*)gl_ub_pool_mgr.pools[pool_id].mem_alloc(sizeof(USRDATA), pool_id,
priority);
diff --git a/src/mds/mds_c_sndrcv.c b/src/mds/mds_c_sndrcv.c
--- a/src/mds/mds_c_sndrcv.c
+++ b/src/mds/mds_c_sndrcv.c
@@ -1,6 +1,7 @@
 /*  -*- OpenSAF  -*-
  *
  * (C) Copyright 2008 The OpenSAF Foundation
+ * Copyright Ericsson AB 2017 - All Rights Reserved.
  *
  * This program is distributed in the hope that it will be useful, but
  * WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY
@@ -420,10 +421,6 @@ static uint32_t mds_mcm_direct_send(NCSM
  memset(, 0, sizeof(req));
  if ((info->info.svc_direct_send.i_priority < MDS_SEND_PRIORITY_LOW) ||
  (info->info.svc_direct_send.i_priority > MDS_SEND_PRIORITY_VERY_HIGH))
{
-if (info->info.svc_direct_send.i_direct_buff != NULL) {
-
m_MDS_FREE_DIRECT_BUFF(info->info.svc_direct_send.i_direct_buff);
-info->info.svc_direct_send.i_direct_buff = NULL;
-}
 m_MDS_LOG_ERR("MDS_SND_RCV: Priority defined is not in range\n");
 return NCSCC_RC_FAILURE;
  }
@@ -432,13 +429,11 @@ static uint32_t mds_mcm_direct_send(NCSM
 m_MDS_LOG_ERR("MDS_SND_RCV: Send Message Direct Buff is NULL\n");
 return NCSCC_RC_FAILURE;
  } else if (info->info.svc_direct_send.i_direct_buff_len >
MDS_DIRECT_BUF_MAXSIZE) {
-mds_free_direct_buff(info->info.svc_direct_send.i_direct_buff);
 m_MDS_LOG_ERR("MDS_SND_RCV: Send Message Direct Buff Len is greater
than SEND SIZE\n");
 return NCSCC_RC_FAILURE;
  }
 
  if ((info->info.svc_direct_send.i_to_svc == 0) || (info->i_svc_id == 0)) {
-m_MDS_FREE_DIRECT_BUFF(info->info.svc_direct_send.i_direct_buff);
 m_MDS_LOG_ERR("MDS_SND_RCV: Source or Dest service provided is
Null, src svc_id = %s(%d), dest svc_id = %s(%d) \n",
   get_svc_names(info->i_svc_id), info->i_svc_id,
get_svc_names(info->info.svc_direct_send.i_to_svc),
info->info.svc_direct_send.i_to_svc);
 return NCSCC_RC_FAILURE;
@@ -633,11 +628,6 @@ static uint32_t mds_mcm_direct_send(NCSM
 status = NCSCC_RC_FAILURE;
 break;
  }
- if (status == MDS_INT_RC_DIRECT_SEND_FAIL) {
-/* Free the MDS Direct Buff */
-m_MDS_FREE_DIRECT_BUFF(info->info.svc_direct_send.i_direct_buff);
-status = NCSCC_RC_FAILURE;
- }
  return status;
 }
 
@@ -2014,6 +2004,7 @@ static uint32_t mcm_process_await_active
 

Re: [devel] [PATCH 0 of 2] Review Request for mdstest: handle memory leak [#1860]

2017-03-07 Thread Vo Minh Hoang
Dear Zoran,

Thank you very much for your checking.

Would you please tell me which test case is failed in your environment
because my current pc return OK for all and no mem leak. That might because
of threading problem.

Further information, please note that this patch should apply after #2174
(already been pushed).

Sincerely,
Hoang

-Original Message-
From: Zoran Milinkovic [mailto:zoran.milinko...@ericsson.com] 
Sent: Tuesday, March 7, 2017 10:07 PM
To: Hoang Minh Vo ; mahesh.va...@oracle.com;
Anders Widell 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 0 of 2] Review Request for mdstest: handle
memory leak [#1860]

Hi Hoang,

Reviewed and tested both patches.

There is still a memory leak when some tests fail.

==20325== 752 bytes in 2 blocks are definitely lost in loss record 9 of 10
==20325==at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20325==by 0x4E7731D: mds_mcm_user_event_callback (mds_c_api.c:3301)
==20325==by 0x4E78BB1: mds_mcm_svc_up (mds_c_api.c:1615)
==20325==by 0x4E95C14: mdtm_process_discovery_events
(mds_dt_tipc.c:1031)
==20325==by 0x4E95C14: mdtm_process_recv_events (mds_dt_tipc.c:699)
==20325==by 0x50C7183: start_thread (pthread_create.c:312)
==20325==by 0x53D737C: clone (clone.S:111)
==20325== 
==20325== LEAK SUMMARY:
==20325==definitely lost: 752 bytes in 2 blocks
==20325==indirectly lost: 0 bytes in 0 blocks
==20325==  possibly lost: 120 bytes in 4 blocks
==20325==still reachable: 263,815 bytes in 5 blocks
==20325== suppressed: 0 bytes in 0 blocks

Thanks,
Zoran


-Original Message-
From: Hoang Vo [mailto:hoang.m...@dektech.com.au] 
Sent: den 6 mars 2017 09:00
To: mahesh.va...@oracle.com; Anders Widell 
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 0 of 2] Review Request for mdstest: handle memory
leak [#1860]

Summary: mdstest: handle memory leak [#1860] Review request for Trac
Ticket(s): #1870 Peer Reviewer(s): mahesh.va...@oracle.com;
zoran.milinko...@ericsson.com Pull request to: mahesh.va...@oracle.com
Affected branch(es): default Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 9a1f61672dd538472bf0c1340011467a35f83a23
Author: Hoang Vo 
Date:   Mon, 06 Mar 2017 14:52:06 +0700

mds: handle memory leak [#1860]

Some error handling does not clean internal memory. Error handling
in
dirrect send case clear user memory seem inconsistence, mds should
let
creater manage its memory in error cases.

action: implement as proposed.

changeset 1efa643eb496a2938d1ddfecac6e91aa4a1cda88
Author: Hoang Vo 
Date:   Mon, 06 Mar 2017 14:52:08 +0700

mdstest: handle memory leak [#1860]

mdstest leak in many cases because of:
- incorrect use of input param
- wrong test sequence (cacel subscription after uninstall)
- malloc then terminate thread (cannot reach free)
- missing free on receiving message (used global pointer)
- encode wrong message length

action: fix above cases


Complete diffstat:
--
 src/base/sysf_mem.c|3 +
 src/mds/apitest/mdstipc.h  |3 +-
 src/mds/apitest/mdstipc_api.c  |  288

---
 src/mds/apitest/mdstipc_conf.c |   42 ++---
 src/mds/mds_c_sndrcv.c |   34 +++-
 5 files changed, 181 insertions(+), 189 deletions(-)


Testing Commands:
-
G_SLICE=always-malloc G_DEBUG=gc-friendly  valgrind -v --tool=memcheck
--leak-check=full --num-callers=40 --log-file=valgrind.log mdstest

Testing, Expected Results:
--
No definitely and indirectly lost report in valgrind.log

Conditions of Submission:
-
ACK from maintainer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally 

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-02-10 Thread Vo Minh Hoang
Dear Mahesh,

Based on what I saw, in this case, retention time cannot detect CPND
temporarily down because its pid changed.

If cpnd is temporary down only, we don't need clean up anything.
If cpnd is permanently down, the bad effect of this proposal is that replica
is not clean up. But if cpnd permanently down, we have to reboot node for
recovering so I think this cleanup is not really necessary.

I also checked this implementation with possible test cases and have not
seen any side effect.
Please consider it.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, February 10, 2017 10:40 AM
To: Hoang Vo ; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
[#1765] V5

Hi Hoang,

The CPD_CPND_DOWN_RETENTION  is to recognize, ether CPND temporarily down or
permanently down, this is started a CPND is down and based on
cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete down
and do cleanup, else  cpnd rejoined with in CPD_CPND_DOWN_RETENTION_TIME ,
the CPD_CPND_DOWN_RETENTION is stoped.

If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(), do cpd
recognize the CPD permanently down, the cpd_process_cpnd_dow() being called
in multiple flows, can you please check all the flows, is stopping
CPD_CPND_DOWN_RETENTION timer has any impact ?

-AVM

On 2/9/2017 1:35 PM, Hoang Vo wrote:
>   src/ckpt/ckptd/cpd_proc.c |  11 ++-
>   1 files changed, 10 insertions(+), 1 deletions(-)
>
>
> problem:
> In case failover multiple times, the cpnd is down for a moment so 
> there is no cpnd opening specific checkpoint. This lead to retention timer
is trigger.
> When cpnd is up again but has different pid so retention timer is not
stoped.
> Repica is deleted at retention while its information still be in ckpt
database.
> That cause problem
>
> Fix:
> - Stop timer of removed node.
> - Update data in patricia trees (for retention value consistence).
>
> diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c
> --- a/src/ckpt/ckptd/cpd_proc.c
> +++ b/src/ckpt/ckptd/cpd_proc.c
> @@ -679,7 +679,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
>   cpd_cpnd_info_node_find_add(>cpnd_tree, cpnd_dest, _info,
_flag);
>   if (!cpnd_info)
>   return NCSCC_RC_SUCCESS;
> -
> + /* Stop timer before processing down */
> + cpd_tmr_stop(_info->cpnd_ret_timer);
>   cref_info = cpnd_info->ckpt_ref_list;
>   
>   while (cref_info) {
> @@ -989,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB *
>   
>   /* Update the retention Time */
>   (*ckpt_node)->ret_time = reten_time;
> + (*ckpt_node)->attributes.retentionDuration = reten_time;
> +
> + /* Update the related patricia tree */
> + CPD_CKPT_MAP_INFO *map_info = NULL;
> + cpd_ckpt_map_node_get(>ckpt_map_tree, (*ckpt_node)->ckpt_name,
_info);
> + if (map_info) {
> + map_info->attributes.retentionDuration = reten_time;
> + }
>   return rc;
>   }
>   



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in checkpoint open call [#2283]

2017-02-09 Thread Vo Minh Hoang
Dear Zoran,


I am sorry if you are waiting for me in this ticket.
ACK.

Sincerely,
Hoang

-Original Message-
From: Zoran Milinkovic [mailto:zoran.milinko...@ericsson.com] 
Sent: Wednesday, February 1, 2017 8:13 PM
To: A V Mahesh 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: return SA_AIS_ERR_BAD_HANDLE if
ckpt handle is not found in checkpoint open call [#2283]

Hi Mahesh,

I was going through the code and found that the function does not reply back
to the agent if client node is not found.
Since the API call is synchronized, if the agent does not get the reply back
from ND, the call will wait until it reaches the timeout.

If you look at the code, you will see that it's obvious that this reply is
missing in the function.

BR,
Zoran

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com]
Sent: den 1 februari 2017 04:39
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt
handle is not found in checkpoint open call [#2283]

Hi Zoran,

ACK , not tested.

Is this case  getting hit in case of Headless ?

-AVM

On 1/31/2017 6:24 PM, Zoran Milinkovic wrote:
>   src/ckpt/ckptnd/cpnd_evt.c |  3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
>
>
> If a client node is not found in cpnd_evt_proc_ckpt_open(), the checkpoint
node director will reply with SA_AIS_ERR_BAD_HANDLE.
>
> diff --git a/src/ckpt/ckptnd/cpnd_evt.c b/src/ckpt/ckptnd/cpnd_evt.c
> --- a/src/ckpt/ckptnd/cpnd_evt.c
> +++ b/src/ckpt/ckptnd/cpnd_evt.c
> @@ -616,7 +616,8 @@ static uint32_t cpnd_evt_proc_ckpt_open(
>   cpnd_client_node_get(cb, client_hdl, _node);
>   if (cl_node == NULL) {
>   TRACE_4("cpnd client hdl get failed for client
hdl:%llx",client_hdl);
> - return rc;
> + send_evt.info.cpa.info.openRsp.error =
SA_AIS_ERR_BAD_HANDLE;
> + goto agent_rsp;
>   }
>   
>   if (((cp_node = cpnd_ckpt_node_find_by_name(cb, ckpt_name)) != 
> NULL) && cp_node->is_unlink == false) {



--
Check out the vibrant tech community on one of the world's most engaging
tech sites, SlashDot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0 of 1] Review Request for cpsv: Update ckpt_reploc_tree when unlinking a checkpoint [#1655

2017-02-06 Thread Vo Minh Hoang
Dear Mahesh and Zoran,

I checked the problem that mentioned in this ticket in newest source code
and cannot reproduce following recorded steps.

I suggest to set #1655 and #1765 to invalid and open new ones if found any
problem.
Is that possible?

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, February 1, 2017 10:52 AM
To: Zoran Milinkovic ; Hoang Minh Vo

Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Update ckpt_reploc_tree
when unlinking a checkpoint [#1655

Hi Hoang,

Is this issue is an extension of  ticket #1765 issue `cpd: to correct
failover behavior of cpsv [#1765]`  ?
If so let us fist address the Ticket  #1765 with my proposed approach , then
we can fix this issue on top of that.

-AVM

On 1/31/2017 5:23 PM, Zoran Milinkovic wrote:
> Hi Hoang,
>
> Can you share the test code ?
> I cannot reproduce steps you have described. I might be missing something.
>
> Thanks,
> Zoran
>
> -Original Message-
> From: Hoang Vo [mailto:hoang.m...@dektech.com.au]
> Sent: den 19 januari 2017 08:08
> To: mahesh.va...@oracle.com; Zoran Milinkovic 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 0 of 1] Review Request for cpsv: Update 
> ckpt_reploc_tree when unlinking a checkpoint [#1655
>
> Summary: cpsv: Update ckpt_reploc_tree when unlinking a checkpoint 
> [#1655] Review request for Trac Ticket(s): 1655 Peer Reviewer(s): 
> mahesh.va...@oracle.com; zoran.milinko...@ericsson.com Pull request 
> to: mahesh.va...@oracle.com Affected branch(es): default Development 
> branch: default
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesy
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
> Rebase patch to latest folder structure, do not change any source code 
> inside Patched after 1765
>
> changeset 6ffeaa4fbf2e352bd42a4bba160c4c593efcf749
> Author:   Hoang Vo 
> Date: Thu, 19 Jan 2017 13:59:09 +0700
>
>   Problem:
>    The replica IMM objects are not created after opening a
checkpoint
>   in following scenario:
>
>   1. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE 2. Unlink
the
>   checkpoint ( the checkpoint is still being used) 3. Open a
checkpoint with
>   flag SA_CKPT_CHECKPOINT_CREATE with same name as the one in 1.
>
>   After step 3. although the checkpoint is opened successfully, the
replica
>   IMM objects are not created.
>
>   The problem happens because the CPD does not delete relating nodes
from
>   ckpt_reploc_tree when it unlinks the checkpoint in step 2.
>
>   Solution:
>   - The solution is to remove replica location node of that
checkpoint
>   from the ckpt_reploc_tree when unlinking the checkpoint.
>
>
> Complete diffstat:
> --
>   src/ckpt/ckptd/cpd_db.c   |   4 
>   src/ckpt/ckptd/cpd_proc.c |  30 ++
>   2 files changed, 34 insertions(+), 0 deletions(-)
>
>
> Testing Commands:
> -
> Follow testing step specified in the ticket 1655
>
> Testing, Expected Results:
> --
> Refer the ticket 1655 description for expected result
>
> Conditions of Submission:
> -
> ACK from maintainer
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any 
> checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank
entries
>  that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your
headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>  (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>  

Re: [devel] [PATCH 0 of 1] Review Request for cpsv: Update ckpt_reploc_tree when unlinking a checkpoint [#1655

2017-01-18 Thread Vo Minh Hoang
Dear Mahesh,

 

>> So we can always re-use the existing UN-linked resources by just simply
removing UN-link flag, 
>> what is your opinion?

 

Based on my understanding, new checkpoint even has the same name but might
have different attribute (collocated/non-collocated) and be opened from
different node make the way we manage replicas become complicated.

I fear that will cost more work that Nhat's solution.

 

Thank you and best regards,

Hoang

 

From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, January 19, 2017 2:29 PM
To: Hoang Vo ; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Update ckpt_reploc_tree
when unlinking a checkpoint [#1655

 

HiHoang,

>>The replica IMM objects are not created after opening a checkpoint in
following scenario:

1. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE

2. Unlink the checkpoint ( the checkpoint is still being used)

3. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE with same name
as the on in 1.

>>After 3. although the checkpoint is opened successfully, the replica IMM
objects are not created.

As I Know CKPT specification  doesn't say , if checkpoint is reopened with
same name
 which is currently in UN-linked state and not yet expired/cleaned  an new
instance should be created .

So we can always re-use the existing UN-linked resources by just simply
removing UN-link flag, 
what is your opinion?

-AVM

On 1/19/2017 12:37 PM, Hoang Vo wrote:

Summary: cpsv: Update ckpt_reploc_tree when unlinking a checkpoint [#1655]
Review request for Trac Ticket(s): 1655
Peer Reviewer(s): mahesh.va...@oracle.com  ;
zoran.milinko...@ericsson.com  
Pull request to: mahesh.va...@oracle.com  
Affected branch(es): default
Development branch: default
 

Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n
 
 
Comments (indicate scope for each "y" above):
-
Rebase patch to latest folder structure, do not change any source code
inside
Patched after 1765
 
changeset 6ffeaa4fbf2e352bd42a4bba160c4c593efcf749
Author:  Hoang Vo  

Date:Thu, 19 Jan 2017 13:59:09 +0700
 
  Problem:
   The replica IMM objects are not created after opening a
checkpoint
  in following scenario:
 
  1. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE 2. Unlink the
  checkpoint ( the checkpoint is still being used) 3. Open a checkpoint with
  flag SA_CKPT_CHECKPOINT_CREATE with same name as the one in 1.
 
  After step 3. although the checkpoint is opened successfully, the replica
  IMM objects are not created.
 
  The problem happens because the CPD does not delete relating nodes from
  ckpt_reploc_tree when it unlinks the checkpoint in step 2.
 
  Solution:
  - The solution is to remove replica location node of that
checkpoint
  from the ckpt_reploc_tree when unlinking the checkpoint.
 
 
Complete diffstat:
--
 src/ckpt/ckptd/cpd_db.c   |   4 
 src/ckpt/ckptd/cpd_proc.c |  30 ++
 2 files changed, 34 insertions(+), 0 deletions(-)
 
 
Testing Commands:
-
Follow testing step specified in the ticket 1655
 
Testing, Expected Results:
--
Refer the ticket 1655 description for expected result
 
Conditions of Submission:
-
ACK from maintainer
 
Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n
 
 
Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]
 
 
Your checkin has not passed review because (see checked entries):
 
___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.
 
___ You have failed to nominate the proper persons for review and push.
 
___ Your patches do not have proper short+long header
 
___ You have grammar/spelling in your header that is unacceptable.
 
___ You have exceeded a sensible line length in your headers/comments/text.
 
___ You have failed to put in a proper Trac Ticket # into your commits.
 
___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)
 
___ You have not given any evidence of testing 

Re: [devel] [PATCH 0 of 1] Review Request for cpd: to correct failover behavior of cpsv [#1765] V3

2017-01-18 Thread Vo Minh Hoang
Dear Mahesh,

I checked with newest source code, problem still occur.
So it is in different case.

Btw, I found some unexpected characters in submitted patch.
So I will send updated file for review.

Thank you and best regards,
Hoang

-Original Message-
From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au] 
Sent: Thursday, January 19, 2017 2:17 PM
To: 'A V Mahesh' <mahesh.va...@oracle.com>; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0 of 1] Review Request for cpd: to correct
failover behavior of cpsv [#1765] V3

Dear Mahesh,

I will check that again.
I have just rebased it when this patch stayed in local PC for too long.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com]
Sent: Thursday, January 19, 2017 2:11 PM
To: Hoang Vo <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0 of 1] Review Request for cpd: to correct
failover behavior of cpsv [#1765] V3

Hi Hoang,

  >>Testing Commands:
>>-
>>Create checkpoint and set retention to big value Failover by killing 
>>osafamfd multiple times Check checkpoint information

Is this killing osafamfd is just create fail-over ? ( make standby cpd to
become active cpd) if so this has some relation with #2253 , let us retest
the case with
#2253 and confirm the issue still exist.

-AVM

On 1/19/2017 12:33 PM, A V Mahesh wrote:
> Hi Hoang,
>
> Can you please crosscheck is this issue has any relation with #2253
>
> -AVM
>
>
> On 1/19/2017 12:28 PM, Hoang Vo wrote:
>> Summary: cpd: to correct failover behavior of cpsv [#1765] Review 
>> request for Trac Ticket(s): 1765 Peer Reviewer(s):
>> mahesh.va...@oracle.com; zoran.milinko...@ericsson.com Pull request
>> to: mahesh.va...@oracle.com Affected branch(es): default Development
>> branch: default
>>
>> 
>> Impacted area   Impact y/n
>> 
>>Docsn
>>Build systemn
>>RPM/packaging   n
>>Configuration files n
>>Startup scripts n
>>SAF servicesy
>>OpenSAF servicesn
>>Core libraries  n
>>Samples n
>>Tests   n
>>Other   n
>>
>>
>> Comments (indicate scope for each "y" above):
>> -
>> Rebase source code to newest folder structure, do not change anything 
>> compare to previous version
>>
>> changeset 9c34df19e6b98ece2cfcd10be0b748d3b563e029
>> Author:  Hoang Vo <hoang.m...@dektech.com.au>
>> Date:Thu, 19 Jan 2017 13:52:02 +0700
>>
>>  cpd: to correct failover behavior of cpsv [#1765]
>>
>>  problem: In case a failover happens while a checkpoint is being
unlinked, it
>>  might causes an unfinished unlink operation (i.e the checkpoint IMM
object
>>  is not deleted). Later on, when the checkpoint is created again, it
will not
>>  succeed because the CPD detects that the checkpoint IMM object
existing.
>>
>>  Fix:
>>  - When error occur delete the existing checkpoint IMM object and
re-create new
>>  one.
>>  - Stop timer of removed node.
>>  - Update data in patricia trees.
>>
>>
>> Complete diffstat:
>> --
>>src/ckpt/ckptd/cpd_db.c |  15 +++
>>src/ckpt/ckptd/cpd_evt.c|  12 
>>src/ckpt/ckptd/cpd_proc.c   |  18 --
>>src/ckpt/ckptnd/cpnd_evt.c  |   3 ++-
>>src/ckpt/ckptnd/cpnd_proc.c |  12 +---
>>5 files changed, 50 insertions(+), 10 deletions(-)
>>
>>
>> Testing Commands:
>> -
>> Create checkpoint and set retention to big value Failover by killing 
>> osafamfd multiple times Check checkpoint information
>>
>> Testing, Expected Results:
>> --
>> Checkpoint information is not change
>>
>> Conditions of Submission:
>> -
>> ACK from maintainer
>>
>> Arch  Built StartedLinux distro
>> ---
>> mipsn  n
>> mips64  n  n
>> x86 n  n
>> x86_64  n  n
>> powerpc n  n
>> powerpc64   n  n
>>
>>
>> Reviewer Checklist:
>> ---
>> [Submitters: make sure that your

Re: [devel] [PATCH 0 of 1] Review Request for cpd: to correct failover behavior of cpsv [#1765] V3

2017-01-18 Thread Vo Minh Hoang
Dear Mahesh,

I will check that again.
I have just rebased it when this patch stayed in local PC for too long.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, January 19, 2017 2:11 PM
To: Hoang Vo ; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0 of 1] Review Request for cpd: to correct
failover behavior of cpsv [#1765] V3

Hi Hoang,

  >>Testing Commands:
>>-
>>Create checkpoint and set retention to big value Failover by killing 
>>osafamfd multiple times Check checkpoint information

Is this killing osafamfd is just create fail-over ? ( make standby cpd to
become active cpd) if so this has some relation with #2253 , let us retest
the case with
#2253 and confirm the issue still exist.

-AVM

On 1/19/2017 12:33 PM, A V Mahesh wrote:
> Hi Hoang,
>
> Can you please crosscheck is this issue has any relation with #2253
>
> -AVM
>
>
> On 1/19/2017 12:28 PM, Hoang Vo wrote:
>> Summary: cpd: to correct failover behavior of cpsv [#1765] Review 
>> request for Trac Ticket(s): 1765 Peer Reviewer(s): 
>> mahesh.va...@oracle.com; zoran.milinko...@ericsson.com Pull request 
>> to: mahesh.va...@oracle.com Affected branch(es): default Development 
>> branch: default
>>
>> 
>> Impacted area   Impact y/n
>> 
>>Docsn
>>Build systemn
>>RPM/packaging   n
>>Configuration files n
>>Startup scripts n
>>SAF servicesy
>>OpenSAF servicesn
>>Core libraries  n
>>Samples n
>>Tests   n
>>Other   n
>>
>>
>> Comments (indicate scope for each "y" above):
>> -
>> Rebase source code to newest folder structure, do not change anything 
>> compare to previous version
>>
>> changeset 9c34df19e6b98ece2cfcd10be0b748d3b563e029
>> Author:  Hoang Vo 
>> Date:Thu, 19 Jan 2017 13:52:02 +0700
>>
>>  cpd: to correct failover behavior of cpsv [#1765]
>>
>>  problem: In case a failover happens while a checkpoint is being
unlinked, it
>>  might causes an unfinished unlink operation (i.e the checkpoint IMM
object
>>  is not deleted). Later on, when the checkpoint is created again, it
will not
>>  succeed because the CPD detects that the checkpoint IMM object
existing.
>>
>>  Fix:
>>  - When error occur delete the existing checkpoint IMM object and
re-create new
>>  one.
>>  - Stop timer of removed node.
>>  - Update data in patricia trees.
>>
>>
>> Complete diffstat:
>> --
>>src/ckpt/ckptd/cpd_db.c |  15 +++
>>src/ckpt/ckptd/cpd_evt.c|  12 
>>src/ckpt/ckptd/cpd_proc.c   |  18 --
>>src/ckpt/ckptnd/cpnd_evt.c  |   3 ++-
>>src/ckpt/ckptnd/cpnd_proc.c |  12 +---
>>5 files changed, 50 insertions(+), 10 deletions(-)
>>
>>
>> Testing Commands:
>> -
>> Create checkpoint and set retention to big value Failover by killing 
>> osafamfd multiple times Check checkpoint information
>>
>> Testing, Expected Results:
>> --
>> Checkpoint information is not change
>>
>> Conditions of Submission:
>> -
>> ACK from maintainer
>>
>> Arch  Built StartedLinux distro
>> ---
>> mipsn  n
>> mips64  n  n
>> x86 n  n
>> x86_64  n  n
>> powerpc n  n
>> powerpc64   n  n
>>
>>
>> Reviewer Checklist:
>> ---
>> [Submitters: make sure that your review doesn't trigger any 
>> checkmarks!]
>>
>>
>> Your checkin has not passed review because (see checked entries):
>>
>> ___ Your RR template is generally incomplete; it has too many blank
entries
>>   that need proper data filled in.
>>
>> ___ You have failed to nominate the proper persons for review and push.
>>
>> ___ Your patches do not have proper short+long header
>>
>> ___ You have grammar/spelling in your header that is unacceptable.
>>
>> ___ You have exceeded a sensible line length in your
headers/comments/text.
>>
>> ___ You have failed to put in a proper Trac Ticket # into your commits.
>>
>> ___ You have incorrectly put/left internal data in your comments/files
>>   (i.e. internal bug tracking tool IDs, product names etc)
>>
>> ___ You have not given any evidence of testing beyond basic build tests.
>>   Demonstrate some level of runtime or other sanity testing.
>>
>> ___ You have ^M present in some of your files. These have to be removed.
>>
>> ___ You have needlessly changed whitespace or added whitespace crimes
>>   like trailing spaces, or spaces before tabs.
>>
>> ___ You have mixed 

Re: [devel] [PATCH 1 of 1] cpd: syncup active standby mbcsv dtat for non-colcated ckpt above 3 replicas[#2253]

2017-01-17 Thread Vo Minh Hoang
Dear Mahesh,

I have 2 comments.
Please find with [Hoang] tag and consider.

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Friday, January 6, 2017 7:17 PM
To: hoang.m...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] cpd: syncup active standby mbcsv dtat for
non-colcated ckpt above 3 replicas[#2253]

 src/ckpt/ckptd/cpd_sbevt.c |  30 +-
 1 files changed, 21 insertions(+), 9 deletions(-)


Issue :According to Ckpt non-collocated ckpt
implementation the cluster can have max 3 replicas
  and minimum of 2 replicas,if the non-collocated
ckpt is opened on controller initially ,
  by default cpsv service will create 2 replicas
each one on controllers ,
  else the non-collocated ckpt is opened on payload
initially,by default cpsv service will create 3 replicas
  one on the payload and other each one on
controllers,so any further opens form any other payload is not
  required to create replicas  locally.All other
node ckpt application will access the data form the
  default created active replica.

 In current code ha bug in active standby MBCSV
checkpoint of CPD_CKPT_REF_INFO data is mismatching
 while creating replica node for  non-collocated of
a payload

Fix : This patch address the issue by  matching
CPD_CKPT_REF_INFO data by not crating
cpd_ckpt_reploc_node  cpd_ckpt_ref_info , for the
any further opens
form any other payload opened the ckpt above max 3
replicas.

diff --git a/src/ckpt/ckptd/cpd_sbevt.c b/src/ckpt/ckptd/cpd_sbevt.c
--- a/src/ckpt/ckptd/cpd_sbevt.c
+++ b/src/ckpt/ckptd/cpd_sbevt.c
@@ -456,6 +456,7 @@ uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
SaClmClusterNodeT cluster_node;
CPD_REP_KEY_INFO key_info;
CPD_NODE_REF_INFO *nref_info;
+   bool noncoll_rep_on_payload = false;
 
TRACE_ENTER();
 
@@ -497,9 +498,20 @@ uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
 
reploc_info->rep_key.node_name =
strdup(osaf_extended_name_borrow(_node.nodeName));
reploc_info->rep_key.ckpt_name =
strdup(ckpt_node->ckpt_name);
-   if
(!m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes))
+   if
(!m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)) {
reploc_info->rep_type = REP_NONCOLL;
-   else {
+   if
((cpd_get_slot_sub_id_from_mds_dest(msg->info.dest_add.mds_dest) ==
cb->cpd_remote_id) ||
+
(cpd_get_slot_sub_id_from_mds_dest(msg->info.dest_add.mds_dest) ==
cb->cpd_self_id) ) {
+   TRACE_4(" reploc node add for
non-collocated on controller ckpt_id:%llx", msg->info.dest_add.ckpt_id);
+   proc_rc =
cpd_ckpt_reploc_node_add(>ckpt_reploc_tree, reploc_info, cb->ha_state,
cb->immOiHandle);
+   if (proc_rc != NCSCC_RC_SUCCESS) {
+   TRACE_4("cpd standby dest add evt
failed ");
[Hoang] reploc_info is malloc in this scope, should free it here to avoid
mem leak. So do LOC: 511 and 529.
+   }
+   } else {
+   TRACE_4(" reploc node add for
non-collocated on paylaod ckpt_id:%llx",msg->info.dest_add.ckpt_id);
[Hoang] consider changing trace message when we do not add anything here.
Small typos " paylaod ".
+   noncoll_rep_on_payload = true;
+   }
+   } else {
if ((ckpt_node->attributes.creationFlags &
SA_CKPT_WR_ALL_REPLICAS) &&
 
(m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)))
reploc_info->rep_type = REP_SYNCUPD; @@
-511,17 +523,17 @@ uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
if ((ckpt_node->attributes.creationFlags &
SA_CKPT_WR_ACTIVE_REPLICA_WEAK) &&
 
(m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)))
reploc_info->rep_type = REP_NOTACTIVE;
-   }
 
-   proc_rc = cpd_ckpt_reploc_node_add(>ckpt_reploc_tree,
reploc_info, cb->ha_state, cb->immOiHandle);
-   if (proc_rc != NCSCC_RC_SUCCESS) {
-   TRACE_4("cpd standby dest add evt failed ");
-   /*  goto free_mem; */
+   proc_rc =
cpd_ckpt_reploc_node_add(>ckpt_reploc_tree, reploc_info, cb->ha_state,
cb->immOiHandle);
+   if (proc_rc != NCSCC_RC_SUCCESS) {
+   TRACE_4("cpd standby dest add evt failed
");
+   }
}
}
 
-   cpd_ckpt_ref_info_add(node_info, ckpt_node);
-   
+   if 

Re: [devel] [PATCH 1 of 1] cpd: syncup active standby mbcsv dtat for non-colcated ckpt above 3 replicas[#2253]

2017-01-17 Thread Vo Minh Hoang
Dear Mahesh,

Would you please rebase this patch, it seems a little bit out date.
---
patching file src/ckpt/ckptd/cpd_sbevt.c
Hunk #2 FAILED at 497
Hunk #3 FAILED at 511
2 out of 3 hunks FAILED
---

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, January 17, 2017 10:45 AM
To: hoang.m...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] cpd: syncup active standby mbcsv dtat
for non-colcated ckpt above 3 replicas[#2253]

Hi Hoang,

It seem you missed to see this, can you please review.

-AVM


On 1/6/2017 5:47 PM, mahesh.va...@oracle.com wrote:
>   src/ckpt/ckptd/cpd_sbevt.c |  30 +-
>   1 files changed, 21 insertions(+), 9 deletions(-)
>
>
> Issue :According to Ckpt non-collocated ckpt
implementation the cluster can have max 3 replicas
>and minimum of 2 replicas,if the non-collocated
ckpt is opened on controller initially ,
>by default cpsv service will create 2 replicas
each one on controllers ,
>else the non-collocated ckpt is opened on
payload initially,by default cpsv service will create 3 replicas
>one on the payload and other each one on
controllers,so any further opens form any other payload is not
>required to create replicas  locally.All other
node ckpt application will access the data form the
>default created active replica.
>
>   In current code ha bug in active standby MBCSV
checkpoint of CPD_CKPT_REF_INFO data is mismatching
>   while creating replica node for  
> non-collocated of a payload
>
> Fix : This patch address the issue by  matching
CPD_CKPT_REF_INFO data by not crating
>  cpd_ckpt_reploc_node  cpd_ckpt_ref_info , for the
any further opens
>  form any other payload opened the ckpt above max
3 replicas.
>
> diff --git a/src/ckpt/ckptd/cpd_sbevt.c b/src/ckpt/ckptd/cpd_sbevt.c
> --- a/src/ckpt/ckptd/cpd_sbevt.c
> +++ b/src/ckpt/ckptd/cpd_sbevt.c
> @@ -456,6 +456,7 @@ uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
>   SaClmClusterNodeT cluster_node;
>   CPD_REP_KEY_INFO key_info;
>   CPD_NODE_REF_INFO *nref_info;
> + bool noncoll_rep_on_payload = false;
>   
>   TRACE_ENTER();
>   
> @@ -497,9 +498,20 @@ uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
>   
>   reploc_info->rep_key.node_name =
strdup(osaf_extended_name_borrow(_node.nodeName));
>   reploc_info->rep_key.ckpt_name =
strdup(ckpt_node->ckpt_name);
> - if
(!m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes))
> + if
(!m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)) {
>   reploc_info->rep_type = REP_NONCOLL;
> - else {
> + if
((cpd_get_slot_sub_id_from_mds_dest(msg->info.dest_add.mds_dest) ==
cb->cpd_remote_id) ||
> +
(cpd_get_slot_sub_id_from_mds_dest(msg->info.dest_add.mds_dest) ==
cb->cpd_self_id) ) {
> + TRACE_4(" reploc node add for non-collocated
on controller ckpt_id:%llx", msg->info.dest_add.ckpt_id);
> + proc_rc =
cpd_ckpt_reploc_node_add(>ckpt_reploc_tree, reploc_info, cb->ha_state,
cb->immOiHandle);
> + if (proc_rc != NCSCC_RC_SUCCESS) {
> + TRACE_4("cpd standby dest add evt
failed ");
> + }
> + } else {
> + TRACE_4(" reploc node add for non-collocated
on paylaod ckpt_id:%llx",msg->info.dest_add.ckpt_id);
> + noncoll_rep_on_payload = true;
> + }
> + } else {
>   if ((ckpt_node->attributes.creationFlags &
SA_CKPT_WR_ALL_REPLICAS) &&
>
(m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)))
>   reploc_info->rep_type = REP_SYNCUPD; @@
-511,17 +523,17 @@ 
> uint32_t cpd_sb_proc_ckpt_dest_add(CPD_C
>   if ((ckpt_node->attributes.creationFlags &
SA_CKPT_WR_ACTIVE_REPLICA_WEAK) &&
>
(m_IS_SA_CKPT_CHECKPOINT_COLLOCATED(_node->attributes)))
>   reploc_info->rep_type = REP_NOTACTIVE;
> - }
>   
> - proc_rc = cpd_ckpt_reploc_node_add(>ckpt_reploc_tree,
reploc_info, cb->ha_state, cb->immOiHandle);
> - if (proc_rc != NCSCC_RC_SUCCESS) {
> - TRACE_4("cpd standby dest add evt failed ");
> - /*  goto free_mem; */
> + proc_rc =
cpd_ckpt_reploc_node_add(>ckpt_reploc_tree, reploc_info, cb->ha_state,
cb->immOiHandle);
> +

Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in cpd_a2s_ckpt_usr_info [#2257]

2017-01-15 Thread Vo Minh Hoang
Hi Zoran,

ACK.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, January 11, 2017 10:59 AM
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in
cpd_a2s_ckpt_usr_info [#2257]

Hi Zoran,

ACK Not tested.

-AVM


On 1/10/2017 9:18 PM, Zoran Milinkovic wrote:
>   src/ckpt/ckptd/cpd_red.c |  3 +++
>   1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> Fix memory leak in cpd_a2s_ckpt_usr_info() in cpd_red.c
>
> diff --git a/src/ckpt/ckptd/cpd_red.c b/src/ckpt/ckptd/cpd_red.c
> --- a/src/ckpt/ckptd/cpd_red.c
> +++ b/src/ckpt/ckptd/cpd_red.c
> @@ -339,5 +339,8 @@ void cpd_a2s_ckpt_usr_info(CPD_CB *cb, C
>   TRACE_4("cpd A2S ckpt user info async update failed");
>   else
>   TRACE_1("cpd A2S ckpt user info async update success");
> +
> + free(cpd_msg.info.usr_info_2.node_list);
> +
>   TRACE_LEAVE();
>   }



--
Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon
Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ckpt: fix extended name issues in the library [#2128]

2017-01-15 Thread Vo Minh Hoang
Hi Zoran,

ACK from me.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, January 13, 2017 10:15 AM
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: fix extended name issues in the
library [#2128]

Hi Zora,

ACK.

LONG DN not tested .

-AVM


On 1/10/2017 8:57 PM, Zoran Milinkovic wrote:
>   src/ckpt/agent/cpa_api.c   |  26 +++---
>   src/ckpt/ckptnd/cpnd_evt.c |  12 
>   2 files changed, 19 insertions(+), 19 deletions(-)
>
>
> Fix string temination issues when SaNameT value is provided to
saCkptCheckpointOpen(), saCkptCheckpointOpenAsync() and
saCkptCheckpointUnlink().
>
> diff --git a/src/ckpt/agent/cpa_api.c b/src/ckpt/agent/cpa_api.c
> --- a/src/ckpt/agent/cpa_api.c
> +++ b/src/ckpt/agent/cpa_api.c
> @@ -871,13 +871,17 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   SaTimeT time_out=0;
>   CPA_GLOBAL_CKPT_NODE *gc_node = NULL;
>   SaConstStringT ckpt_name = NULL;
> + size_t ckpt_name_len;
>   
>   TRACE_ENTER2("SaCkptCheckpointHandleT passed is %llx",ckptHandle);
>   if ((checkpointName == NULL) || (checkpointHandle == NULL) ||
(osaf_extended_name_length(checkpointName) == 0)) {
>   TRACE_4("Cpa CkptOpen Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
>   return SA_AIS_ERR_INVALID_PARAM;
> - } else if (osaf_extended_name_length(checkpointName) >
kOsafMaxDnLength) {
> + }
> +
> + ckpt_name_len = osaf_extended_name_length(checkpointName);
> + if (ckpt_name_len > kOsafMaxDnLength) {
>   TRACE_4("Cpa CkptOpen Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_TOO_BIG, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", SA_AIS_ERR_TOO_BIG);
>   return SA_AIS_ERR_TOO_BIG;
> @@ -962,7 +966,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   lc_node->cl_hdl = ckptHandle;
>   lc_node->open_flags = checkpointOpenFlags;
>   
> - lc_node->ckpt_name = strdup(ckpt_name);
> + lc_node->ckpt_name = strndup(ckpt_name, ckpt_name_len);
>   
>   /* Add CPA_LOCAL_CKPT_NODE to lcl_ckpt_hdl_tree */
>   proc_rc = cpa_lcl_ckpt_node_add(>lcl_ckpt_tree, lc_node); @@ 
> -981,7 +985,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> - osaf_extended_name_lend(ckpt_name,
_name);
> + osaf_extended_name_lend(lc_node->ckpt_name, 
> +_name);
>   
>   if (checkpointCreationAttributes) {
>   evt.info.cpnd.info.openReq.ckpt_attrib = 
> *checkpointCreationAttributes; @@ -1178,6 +1182,7 @@ SaAisErrorT
saCkptCheckpointOpenAsync(Sa
>   CPA_CLIENT_NODE *cl_node = NULL;
>   uint32_t proc_rc = NCSCC_RC_SUCCESS;
>   SaConstStringT ckpt_name = NULL;
> + size_t ckpt_name_len;
>   
>   TRACE_ENTER2("SaCkptCheckpointHandleT passed is %llx",ckptHandle);
>   
> @@ -1185,7 +1190,10 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   TRACE_4("cpa CkptOpenAsync Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
>   return SA_AIS_ERR_INVALID_PARAM;
> - } else if (osaf_extended_name_length(checkpointName) >
kOsafMaxDnLength) {
> + }
> +
> + ckpt_name_len = osaf_extended_name_length(checkpointName);
> + if (ckpt_name_len > kOsafMaxDnLength) {
>   TRACE_4("Cpa CkptOpenAsync Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_TOO_BIG, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", SA_AIS_ERR_TOO_BIG);
>   return SA_AIS_ERR_TOO_BIG;
> @@ -1262,7 +1270,7 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   lc_node->lcl_ckpt_hdl = NCS_PTR_TO_UNS64_CAST(lc_node);
>   lc_node->cl_hdl = ckptHandle;
>   lc_node->open_flags = checkpointOpenFlags;
> - lc_node->ckpt_name = strdup(ckpt_name);
> + lc_node->ckpt_name = strndup(ckpt_name, ckpt_name_len);
>   
>   /* Add CPA_LOCAL_CKPT_NODE to lcl_ckpt_hdl_tree */
>   proc_rc = cpa_lcl_ckpt_node_add(>lcl_ckpt_tree, lc_node); @@ 
> -1281,7 +1289,7 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> - osaf_extended_name_lend(ckpt_name,
_name);
> + osaf_extended_name_lend(lc_node->ckpt_name, 
> +_name);
>   
>   if (checkpointCreationAttributes) {
>   evt.info.cpnd.info.openReq.ckpt_attrib = 
> *checkpointCreationAttributes; @@ -1585,7 +1593,6 @@ SaAisErrorT
saCkptCheckpointUnlink(SaCkp
>   uint32_t proc_rc = NCSCC_RC_SUCCESS;
>   

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765]

2016-12-14 Thread Vo Minh Hoang
Dear Mahesh,

That information is from Nhat in old version of this patch.
I keep this because I think this is needed error handling.

My update part is:
> - Stop timer of removed node.
> - Update data in patricia trees.

In some cases (ex. Delay of getting node HA state) checkpoint call
process_cpnd_down and trigger the retention time of checkpoint, after a
while replica is removed but IMM object still exist.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, December 14, 2016 2:07 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
[#1765]

Hi Hoang Vo,

Not sure why  unfinished unlink operation , unlink will land on old CPD
active or after fail-over New CPD active ,

in transit time CPND will get try-again , can you please elaborate the case
.

-AVM


On 10/13/2016 2:02 PM, Hoang Vo wrote:
>   osaf/services/saf/cpsv/cpd/cpd_db.c   |  15 +++
>   osaf/services/saf/cpsv/cpd/cpd_proc.c |  18 --
>   2 files changed, 31 insertions(+), 2 deletions(-)
>
>
> problem:
> In case a failover happens while a checkpoint is being unlinked, it 
> might causes an unfinished unlink operation (i.e the checkpoint IMM 
> object is not deleted). Later on, when the checkpoint is created again, it
will not succeed because the CPD detects that the checkpoint IMM object
existing.
>
> Fix:
> - When error occur delete the existing checkpoint IMM object and re-create
new one.
> - Stop timer of removed node.
> - Update data in patricia trees.
>
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_db.c 
> b/osaf/services/saf/cpsv/cpd/cpd_db.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_db.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_db.c
> @@ -104,6 +104,21 @@ uint32_t cpd_ckpt_node_add(NCS_PATRICIA_
>   /*create the imm runtime object */
>   if (ha_state == SA_AMF_HA_ACTIVE) {
>   err = create_runtime_ckpt_object(ckpt_node, immOiHandle);
> +
> + /* The Checkpoint IMM object exist due to unfinished
previous opernation (e.g unlink)
> +  * The action is to delete the old object and create a new
one */
> +
> + if (err == SA_AIS_ERR_EXIST) {
> + LOG_WA("cpd ckpt node add - the IMM object exits
%s", 
> +ckpt_node->ckpt_name);
> +
> + if (delete_runtime_ckpt_object(ckpt_node,
immOiHandle) != SA_AIS_OK) {
> + LOG_ER("Deleting run time object %s FAILED",
ckpt_node->ckpt_name);
> + return NCSCC_RC_FAILURE;
> + }
> +
> + err = create_runtime_ckpt_object(ckpt_node,
immOiHandle);
> + }
> +
>   if (err != SA_AIS_OK) {
>   LOG_ER("create runtime ckpt object failed with
error: %u",err);
>   if (err == SA_AIS_ERR_INVALID_PARAM) { diff --git 
> a/osaf/services/saf/cpsv/cpd/cpd_proc.c 
> b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> @@ -348,7 +348,8 @@ uint32_t cpd_ckpt_db_entry_update(CPD_CB
>   proc_rc =
cpd_ckpt_reploc_node_add(>ckpt_reploc_tree, reploc_info, cb->ha_state,
cb->immOiHandle);
>   if (proc_rc != NCSCC_RC_SUCCESS) {
>   /* goto reploc_node_add_fail; */
> - TRACE_4("cpd db add failed ");
> + LOG_ER("cpd db replica add failed ");
> + goto replica_node_add_fail;
>   }
>   }
>   
> @@ -367,6 +368,10 @@ uint32_t cpd_ckpt_db_entry_update(CPD_CB
>   TRACE_LEAVE();
>   return NCSCC_RC_SUCCESS;
>   
> + replica_node_add_fail:
> + cpd_ckpt_node_delete(cb, ckpt_node);
> + ckpt_node = NULL;
> +
>ckpt_node_add_fail:
>   cpd_ckpt_map_node_delete(cb, map_info);
>   map_info = NULL;
> @@ -679,7 +684,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
>   cpd_cpnd_info_node_find_add(>cpnd_tree, cpnd_dest, _info,
_flag);
>   if (!cpnd_info)
>   return NCSCC_RC_SUCCESS;
> -
> + /* Stop timer before processing down */
> + cpd_tmr_stop(_info->cpnd_ret_timer);
>   cref_info = cpnd_info->ckpt_ref_list;
>   
>   while (cref_info) {
> @@ -984,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB *
>   
>   /* Update the retention Time */
>   (*ckpt_node)->ret_time = reten_time;
> + (*ckpt_node)->attributes.retentionDuration = reten_time;
> +
> + /* Update the related patricia tree */
> + CPD_CKPT_MAP_INFO *map_info = NULL;
> + cpd_ckpt_map_node_get(>ckpt_map_tree, (*ckpt_node)->ckpt_name,
_info);
> + if (map_info) {
> + map_info->attributes.retentionDuration = reten_time;
> + }
>   return rc;
>   }
>   




Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]

2016-11-30 Thread Vo Minh Hoang
Dear Mahesh,

ACK all three patches, tested, found no problem.

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure
shm availability before writing [#2202]

Summary:leap : now leap library ensure shm availability before writing
[#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang /
Ramesh Pull request to: <> Affected
branch(es): <> Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 15:59:21 +0530

leap : now leap library ensure shm availability before writing
[#2202]
 Issue  :

If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in
system
, pnd Segmentation fault (core dumped) at LEAP memcpy().

Fix :

Now LEAP library ensures shm free space before writing This may
degrade
some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set,
cpsv
give natural performance.

changeset 083114e13c00c9c4267ffe65a86c1a97a951b876
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:06 +0530

cpsv : update cpsv error handing based on leap changes [#2202]

changeset fb509abb1d1583315f585663fd75bf73e35211a6
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:58 +0530

mqsv : update mqsv error handing based on leap changes [#2202]


Complete diffstat:
--
 osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 ++--
 osaf/libs/common/cpsv/include/cpnd_init.h |   8 
 osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
 osaf/libs/core/include/ncs_osprm.h|   2 +-
 osaf/libs/core/leap/os_defs.c |  20 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
+---
--
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31
++-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
 osaf/services/saf/glsv/glnd/glnd_shm.c|   2 +-
 osaf/services/saf/mqsv/mqnd/mqnd_shm.c|   2 +-
 12 files changed, 123 insertions(+), 78 deletions(-)


Testing Commands:
-
Create situation that node SHM  reaches 100% usage and then perform any CPSV
operation which writes to SHM

Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which 

Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]

2016-11-30 Thread Vo Minh Hoang
Dear Mahesh,

I have one small concern about:
Please consider get CB data by ncshm_take_hdl() or pass in ensures_space as
parameter.
When CB is global access, adding it to parameters of many API is a little
bit inconvenient.
And passing CB for just one ensures_space param might confuse the usage of
function, maybe in future use.

This is just my opinion, please consider about it.

Thank you and best regards,
Hoang


-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 2 of 3] cpsv : update cpsv error handing based on leap
changes [#2202]

 osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   8 +-
 osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
 osaf/libs/core/include/ncs_osprm.h|   2 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
+-
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31 ++
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 +++--
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
 9 files changed, 103 insertions(+), 74 deletions(-)


diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h
b/osaf/libs/common/cpsv/include/cpnd_cb.h
--- a/osaf/libs/common/cpsv/include/cpnd_cb.h
+++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
@@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd  uint32_t
cpnd_amf_deregister(CPND_CB *cpnd_cb);  uint32_t
cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t *bit_position);
uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); -uint32_t
cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf,
CPND_CKPT_NODE **cp_node,
-   uint32_t ref_cnt, CKPT_INFO
*cp_info, bool shm_alloc_guaranteed);
+uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb,
NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node,
+   uint32_t ref_cnt, CKPT_INFO
*cp_info);
 int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type);  uint32_t
cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts);  uint32_t
cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); diff --git
a/osaf/libs/common/cpsv/include/cpnd_init.h
b/osaf/libs/common/cpsv/include/cpnd_init.h
--- a/osaf/libs/common/cpsv/include/cpnd_init.h
+++ b/osaf/libs/common/cpsv/include/cpnd_init.h
@@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C  uint32_t
cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info);
uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST
mds_info);  int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node);
-uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node,
CPND_CKPT_SECTION_INFO
+uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, 
+CPND_CKPT_SECTION_INFO
  *sec_info, const void *data, uint64_t size,
uint64_t offset, uint32_t type);  uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE
*cp_node, CPND_CKPT_SECTION_INFO
 *sec_info, void *data, uint64_t size, uint64_t
offset); @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb,
uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE
*evt_node);  uint32_t cpnd_evt_node_del(CPND_CB *cb,
CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);  CPND_CKPT_NODE
*cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaConstStringT ckpt_name);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node,
SaCkptSectionIdT *id, SaTimeT exp_time,
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE 
+*cp_node, SaCkptSectionIdT *id, SaTimeT exp_time,
  uint32_t gen_flag);
 void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); @@ -176,8 +176,8 @@ void
cpnd_client_node_tree_cleanup(CPND_
 void cpnd_client_node_tree_destroy(CPND_CB *cb);  void
cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb);  void
cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); -uint32_t
cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE
*cp_node); -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node);
+uint32_t cpnd_sec_hdr_update(CPND_CB *cb, CPND_CKPT_SECTION_INFO 
+*pSecPtr, CPND_CKPT_NODE *cp_node); uint32_t 
+cpnd_ckpt_hdr_update(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
 void cpnd_ckpt_node_destroy(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
uint32_t cpnd_get_slot_sub_slot_id_from_mds_dest(MDS_DEST dest);  uint32_t
cpnd_get_slot_sub_slot_id_from_node_id(NCS_NODE_ID i_node_id); diff --git
a/osaf/libs/common/cpsv/include/cpnd_sec.h
b/osaf/libs/common/cpsv/include/cpnd_sec.h
--- a/osaf/libs/common/cpsv/include/cpnd_sec.h
+++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
@@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO *  

Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]

2016-11-29 Thread Vo Minh Hoang
Dear Mahesh,

Unfortunately, I have just receive information that the same core dump still
occur after applying patch.

Here is dump information in short, please tell me if I can do anything in
support:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install
opensaf-ckpt-nodedirector-debuginfo-5.1.0-.0.4997518.sle12.x86_64
(gdb) where
#0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
#1  0x7fe315c26082 in memcpy (__len=, __src=, __dest=)
at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2  ncs_os_posix_shm (req=req@entry=0x7ffecb80adb0) at os_defs.c:874
#3  0x00415a80 in cpnd_sec_hdr_update (cb=cb@entry=0x9e57f0,
sec_info=sec_info@entry=0xb8ff60,
cp_node=cp_node@entry=0xb8e8c0) at cpnd_proc.c:1880
#4  0x00406047 in cpnd_ckpt_sec_add (cb=cb@entry=0x9e57f0,
cp_node=0xb8e8c0, id=0x7fe30c002390,
exp_time=1480480471343486000, gen_flag=gen_flag@entry=0) at
cpnd_db.c:457
#5  0x0040d17c in cpnd_evt_proc_ckpt_sect_create
(cb=cb@entry=0x9e57f0,
evt=evt@entry=0x7fe30c01e1d0, sinfo=sinfo@entry=0x7fe30c01e828) at
cpnd_evt.c:2267
#6  0x0040eaf4 in cpnd_process_evt (evt=0x7fe30c01e1c0) at
cpnd_evt.c:227
#7  0x004106cd in cpnd_main_process (cb=cb@entry=0x9e57f0) at
cpnd_init.c:579
#8  0x00405383 in main (argc=, argv=)
at cpnd_main.c:79

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure
shm availability before writing [#2202]

Summary:leap : now leap library ensure shm availability before writing
[#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang /
Ramesh Pull request to: <> Affected
branch(es): <> Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 15:59:21 +0530

leap : now leap library ensure shm availability before writing
[#2202]
 Issue  :

If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in
system
, pnd Segmentation fault (core dumped) at LEAP memcpy().

Fix :

Now LEAP library ensures shm free space before writing This may
degrade
some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set,
cpsv
give natural performance.

changeset 083114e13c00c9c4267ffe65a86c1a97a951b876
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:06 +0530

cpsv : update cpsv error handing based on leap changes [#2202]

changeset fb509abb1d1583315f585663fd75bf73e35211a6
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:58 +0530

mqsv : update mqsv error handing based on leap changes [#2202]


Complete diffstat:
--
 osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 ++--
 osaf/libs/common/cpsv/include/cpnd_init.h |   8 
 osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
 osaf/libs/core/include/ncs_osprm.h|   2 +-
 osaf/libs/core/leap/os_defs.c |  20 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
+---
--
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31
++-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
 osaf/services/saf/glsv/glnd/glnd_shm.c|   2 +-
 osaf/services/saf/mqsv/mqnd/mqnd_shm.c|   2 +-
 12 files changed, 123 insertions(+), 78 deletions(-)


Testing Commands:
-
Create situation that node SHM  reaches 100% usage and then perform any CPSV
operation which writes to SHM

Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure 

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comment.

Maybe too much but would you please help me the solution to check
cpnd_sec_hdr_update() result in this case.

Calling it again is not really good when in case of insufficient resource
like now, the first call can return error but the second call after that
might return OK.
cpnd_ckpt_sec_del() itself do not have params that carry error information.
Global variable is should not be used also.

Sincerely,
Hoang


-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, November 25, 2016 5:37 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Hi Hoang,

The better approach will be ,  keep existing cpnd_ckpt_sec_del() as it is,
keep condition , avoid functionality the that are not required in
cpnd_ckpt_sec_del(), in case of cpnd_sec_hdr_update()  failure .

-AVM

On 11/25/2016 3:58 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your review and comments.
> I understood that I missed the point at n_secs calculation.
>
> Because checking cpnd_sec_hdr_update() result by calling it again is 
> quite weird.
> Is that acceptable if I move n_secs-- part to cpnd_ckpt_sec_del_db() 
> function? something like this:
> ==
> ==
> 
>
> ..
> if (localSecMap) {
>  if (sectionInfo) {
>LocalSectionIdMap::iterator
> it(localSecMap->find(sectionInfo->lcl_sec_id));
>
>if (it != localSecMap->end())
>  localSecMap->erase(it);
>  }
>}
>else {
>  LOG_ER("can't find local sec map in cpnd_ckpt_sec_del");
>  osafassert(false);
>}
>// Move following:
>if (sectionInfo) {
>  cp_node->replica_info.n_secs--;
>  cp_node->replica_info.mem_used = cp_node->replica_info.mem_used - 
> (sectionInfo->sec_size);
>}
>...
>   
> ==
> ==
> ===
>   The ideal is that cpnd_ckpt_sec_del() keep original
>   the cpnd_ckpt_sec_del_db() is revert of action before
cpnd_sec_hdr_update()
>   cpnd_ckpt_sec_del_db() can be called after cpnd_ckpt_sec_del() many 
> time without affecting system.
>
> Please consider and tell me your judgement.
> Best regards,
> Hoang
>
> -Original Message-
> From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au]
> Sent: Friday, November 25, 2016 5:14 PM
> To: 'A V Mahesh' <mahesh.va...@oracle.com>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while 
> section_hdr_update_fail [#2207]
>
> Dear Mahesh,
>
> Thank you very much for your review and comments.
> I understood that I missed the point at n_secs calculation.
>
> Because checking cpnd_sec_hdr_update()
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, November 25, 2016 3:28 PM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while 
> section_hdr_update_fail [#2207]
>
> Hi Hoang,
>
>   >>I am sorry that I cannot find the ER that you point out.
>
> Ok I understand that  `map  & localSecMap` will not become become null 
> even all elements are removed in fist attempt of 
> cpnd_ckpt_sec_del_db() invocation, so we will not core dump.
>
> But section DB still corrupted, because of cpnd_ckpt_sec_de() function 
> split ( this patch )
>
> Irrelevant of cpnd_sec_hdr_update() success or failure, the 
> `cp_node->replica_info.n_secs++;`  was occurring before 
> cpnd_sec_hdr_update(), so we need to do 
> `cp_node->replica_info.n_secs--;`
>
> If cpnd_sec_hdr_update() successful and  cpnd_ckpt_hdr_update() fails, 
> we also need to do roll-backing SECTION HEADER with  roll backed 
> sectionInfo , so we  need to UPDATE THE SECTION HEADER.
>
> If you see   original cpnd_ckpt_sec_del()  function this is required
> irrelevant of  cpnd_sec_hdr_update() success or failure 
> ==
> ==
> ===
> if (sectionInfo) {
>   cp_node->replica_info.n_secs--;
>   cp_node->replica_info.mem_used = cp_node->replica_info.mem_used 
> - (sectionInfo->sec_size);
>
>   // UPDATE THE SECTION HEADER
>   uint32_t rc(cpnd_sec_hdr_update(sectionInfo, cp_node));
>   if (rc == NCSC

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your review and comments.
I understood that I missed the point at n_secs calculation.

Because checking cpnd_sec_hdr_update() result by calling it again is quite
weird.
Is that acceptable if I move n_secs-- part to cpnd_ckpt_sec_del_db()
function? something like this:



..
if (localSecMap) {
if (sectionInfo) {
  LocalSectionIdMap::iterator
it(localSecMap->find(sectionInfo->lcl_sec_id));

  if (it != localSecMap->end())
localSecMap->erase(it);
}
  }
  else {
LOG_ER("can't find local sec map in cpnd_ckpt_sec_del");
osafassert(false);
  }
  // Move following:
  if (sectionInfo) {
cp_node->replica_info.n_secs--;
cp_node->replica_info.mem_used = cp_node->replica_info.mem_used -
(sectionInfo->sec_size);
  }
  ...
 

===
 The ideal is that cpnd_ckpt_sec_del() keep original
 the cpnd_ckpt_sec_del_db() is revert of action before cpnd_sec_hdr_update()
 cpnd_ckpt_sec_del_db() can be called after cpnd_ckpt_sec_del() many time
without affecting system.

Please consider and tell me your judgement.
Best regards,
Hoang

-----Original Message-
From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au] 
Sent: Friday, November 25, 2016 5:14 PM
To: 'A V Mahesh' <mahesh.va...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Dear Mahesh,

Thank you very much for your review and comments.
I understood that I missed the point at n_secs calculation.

Because checking cpnd_sec_hdr_update() 

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com]
Sent: Friday, November 25, 2016 3:28 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Hi Hoang,

 >>I am sorry that I cannot find the ER that you point out.

Ok I understand that  `map  & localSecMap` will not become become null even
all elements are removed in fist attempt of cpnd_ckpt_sec_del_db()
invocation, so we will not core dump.

But section DB still corrupted, because of cpnd_ckpt_sec_de() function split
( this patch )

Irrelevant of cpnd_sec_hdr_update() success or failure, the
`cp_node->replica_info.n_secs++;`  was occurring before
cpnd_sec_hdr_update(), so we need to do `cp_node->replica_info.n_secs--;`

If cpnd_sec_hdr_update() successful and  cpnd_ckpt_hdr_update() fails, we
also need to do roll-backing SECTION HEADER with  roll backed sectionInfo ,
so we  need to UPDATE THE SECTION HEADER.

If you see   original cpnd_ckpt_sec_del()  function this is required 
irrelevant of  cpnd_sec_hdr_update() success or failure

===
if (sectionInfo) {
 cp_node->replica_info.n_secs--;
 cp_node->replica_info.mem_used = cp_node->replica_info.mem_used -
(sectionInfo->sec_size);

 // UPDATE THE SECTION HEADER
 uint32_t rc(cpnd_sec_hdr_update(sectionInfo, cp_node));
 if (rc == NCSCC_RC_FAILURE) {
   TRACE_4("cpnd sect hdr update failed");
 }

===

So splitting the function is not required  you need to only ensure
cpnd_ckpt_hdr_update() done or not yet done put some filter  and do UPDATE
THE CHECKPOINT HEADER  based on
cpnd_ckpt_hdr_update()

===

if (  cpnd_ckpt_hdr_update() == was successful ) { // UPDATE THE CHECKPOINT
HEADER
 rc = cpnd_ckpt_hdr_update(cp_node);
 if (rc == NCSCC_RC_FAILURE) {
   TRACE_4("cpnd ckpt hdr update failed");
 }
}
========
===

-AVM


On 11/25/2016 12:08 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> The first call erase a element with id from map.
> The second call with same id, no element found so iterator will point 
> to
> map->end().
> Because of that map->erase() does not execute.
>
>   I am sorry that I cannot find the ER that you point out.
>
> Sincerely,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, November 25, 2016 1:20 PM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while 
> section_hdr_update_fail [#2207]
>
> Hi Hoang,
>
> The second invocation of  cpn

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your review and comments.
I understood that I missed the point at n_secs calculation.

Because checking cpnd_sec_hdr_update() 

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, November 25, 2016 3:28 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Hi Hoang,

 >>I am sorry that I cannot find the ER that you point out.

Ok I understand that  `map  & localSecMap` will not become become null even
all elements are removed in fist attempt of cpnd_ckpt_sec_del_db()
invocation, so we will not core dump.

But section DB still corrupted, because of cpnd_ckpt_sec_de() function split
( this patch )

Irrelevant of cpnd_sec_hdr_update() success or failure, the
`cp_node->replica_info.n_secs++;`  was occurring before
cpnd_sec_hdr_update(), so we need to do `cp_node->replica_info.n_secs--;`

If cpnd_sec_hdr_update() successful and  cpnd_ckpt_hdr_update() fails, we
also need to do roll-backing SECTION HEADER with  roll backed sectionInfo ,
so we  need to UPDATE THE SECTION HEADER.

If you see   original cpnd_ckpt_sec_del()  function this is required 
irrelevant of  cpnd_sec_hdr_update() success or failure

===
if (sectionInfo) {
 cp_node->replica_info.n_secs--;
 cp_node->replica_info.mem_used = cp_node->replica_info.mem_used -
(sectionInfo->sec_size);

 // UPDATE THE SECTION HEADER
 uint32_t rc(cpnd_sec_hdr_update(sectionInfo, cp_node));
 if (rc == NCSCC_RC_FAILURE) {
   TRACE_4("cpnd sect hdr update failed");
 }

===

So splitting the function is not required  you need to only ensure
cpnd_ckpt_hdr_update() done or not yet done put some filter  and do UPDATE
THE CHECKPOINT HEADER  based on
cpnd_ckpt_hdr_update()

===

if (  cpnd_ckpt_hdr_update() == was successful ) { // UPDATE THE CHECKPOINT
HEADER
 rc = cpnd_ckpt_hdr_update(cp_node);
 if (rc == NCSCC_RC_FAILURE) {
   TRACE_4("cpnd ckpt hdr update failed");
 }
}

=======

-AVM


On 11/25/2016 12:08 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> The first call erase a element with id from map.
> The second call with same id, no element found so iterator will point 
> to
> map->end().
> Because of that map->erase() does not execute.
>
>   I am sorry that I cannot find the ER that you point out.
>
> Sincerely,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, November 25, 2016 1:20 PM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while 
> section_hdr_update_fail [#2207]
>
> Hi Hoang,
>
> The second invocation of  cpnd_ckpt_sec_del_db() with generate code dump.
>
> =
>
> SectionMap *map(static_cast *>(cp_node->replica_info.section_db));
>   
> if (map) {
>   SectionMap::iterator it(map->find(id));
>
>   if (it != map->end()) {
> sectionInfo = it->second;
> map->erase(it);
>   }
> }
> else {
>   LOG_ER("can't find map in cpnd_ckpt_sec_del");
>   *osafassert(false);*
> }
> =
>
>
> -AVM
>
> On 11/25/2016 11:16 AM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Thank you very much for your comment.
>>
>> Would you please verify again my understanding about this.
>>
>> Old cpnd_ckpt_sec_del() function and new cpnd_ckpt_sec_del_db() 
>> search for the sectionInfo in 2 maps by its id and remove if found.
>> In case ckpt_hdr_update_fails, the cpnd_ckpt_sec_del_db() is called 
>> twice, the first one remove sectionInfo from maps, the second one 
>> does not found sectionInfo and cannot remove anything, will not generate
error.
>>
>> Sincerely,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Friday, November 25, 2016 12:12 PM
>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PAT

Re: [devel] [PATCH 1 of 1] cpnd: ensure shared memory size before writing [#2202]

2016-11-25 Thread Vo Minh Hoang
Dear Mahesh,

I think this problem is #1712, problem occur when
OSAF_CKPT_SHM_ALLOC_GUARANTEE is not set.

My thinking is that because we provide 2 mode (guarantee or not) so we
should making sure no coredump happened.
Btw, because this is out of my scope of decide, I would like to ask Anders
Widell about it.

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, November 25, 2016 4:06 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: ensure shared memory size before writing
[#2202]

Hi Hoang,

Is this issue coming with  enabling #1712  feature  ?

Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1`  will  provide guaranteed CPSV
service SHM issue and this is addressed broadly all SHM memory issue.

So I don't think we required  API level  changes. ( may we need to document
the usage of  Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` )

#1712  description

==
summary: leap: provide ensured disk space option for shm_open 
request [1712]

description:
leap: provide ensured disk space option for shm_open request [1712] Provided
ensured disk space is allocated for NCS_OS_POSIX_SHM_REQ_OPEN request using
posix_fallocate() so that application such as CPSV subsequent writes to
bytes in the specified range are guaranteed not to fail because of lack of
disk space.

Updated the Opensaf services according to new options based on requirements.

Cpsv service uses the ensured disk space option based on
OSAF_CKPT_SHM_ALLOC_GUARANTEE environment variable ,so if user exports as
OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 (true) cpsv provided ensured disk space.

==

-AVM


On 11/23/2016 3:28 PM, Hoang Vo wrote:
>   osaf/libs/core/include/ncs_osprm.h  |   9 +
>   osaf/libs/core/leap/os_defs.c   |  19 +--
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c |  16 
>   3 files changed, 42 insertions(+), 2 deletions(-)
>
>
> problem: when checkpoint service init without shared memory size 
> guaranteed works in high memory load, core dump occur while adding section
to checkpoint.
>
> solution:  check the true size of shared memory before writing to it.
>
> diff --git a/osaf/libs/core/include/ncs_osprm.h 
> b/osaf/libs/core/include/ncs_osprm.h
> --- a/osaf/libs/core/include/ncs_osprm.h
> +++ b/osaf/libs/core/include/ncs_osprm.h
> @@ -557,6 +557,7 @@ typedef enum {
> NCS_OS_POSIX_SHM_REQ_UNLINK,/* unlink is shm_unlink */
> NCS_OS_POSIX_SHM_REQ_READ,
> NCS_OS_POSIX_SHM_REQ_WRITE,
> +  NCS_OS_POSIX_SHM_REQ_STATS,
> NCS_OS_POSIX_SHM_REQ_MAX
>   } NCS_OS_POSIX_SHM_REQ_TYPE;
>   typedef struct ncs_os_posix_shm_req_open_info_tag { @@ -598,6 
> +599,13 @@ typedef struct ncs_os_posix_shm_req_writ
> uint64_t i_offset;
>   } NCS_OS_POSIX_SHM_REQ_WRITE_INFO;
>   
> +typedef struct ncs_os_posix_shm_req_stats_info_tag {
> +  uint32_t i_hdl;
> +  int32_t i_fd;
> +  bool ensures_space;
> +  void *o_addr;
> +} NCS_OS_POSIX_SHM_REQ_STATS_INFO;
> +
>   typedef struct ncs_shm_req_info {
> NCS_OS_POSIX_SHM_REQ_TYPE type;
>   
> @@ -607,6 +615,7 @@ typedef struct ncs_shm_req_info {
>   NCS_OS_POSIX_SHM_REQ_UNLINK_INFO unlink;
>   NCS_OS_POSIX_SHM_REQ_READ_INFO read;
>   NCS_OS_POSIX_SHM_REQ_WRITE_INFO write;
> +NCS_OS_POSIX_SHM_REQ_STATS_INFO stats;
> } info;
>   
>   } NCS_OS_POSIX_SHM_REQ_INFO;
> diff --git a/osaf/libs/core/leap/os_defs.c 
> b/osaf/libs/core/leap/os_defs.c
> --- a/osaf/libs/core/leap/os_defs.c
> +++ b/osaf/libs/core/leap/os_defs.c
> @@ -799,9 +799,9 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>   }
>   } else {
>   if (ftruncate(req->info.open.o_fd, (off_t)
shm_size /* off_t == long */ ) < 0) {
> - printf("ftruncate failed with errno
value %d \n", errno);
> + LOG_WA("ftruncate failed with errno
value %d \n", errno);
>   return NCSCC_RC_FAILURE;
> -}
> +}
>   }
>   
>   uint32_t prot_flag =
ncs_shm_prot_flags(req->info.open.i_flags);
> @@ -859,6 +859,21 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>  req->info.write.i_write_size);
>   break;
>   
> + case NCS_OS_POSIX_SHM_REQ_STATS:
> + if (!req->info.stats.o_addr) {
> + printf("Output space is not defined\n");
> + return NCSCC_RC_FAILURE;
> + }
> +
> + if (req->info.stats.ensures_space) {
> + return NCSCC_RC_SUCCESS;
> + } else {
> + if(fstat(req->info.stats.i_fd,
req->info.stats.o_addr)) {
> +   

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-24 Thread Vo Minh Hoang
Dear Mahesh,

The first call erase a element with id from map.
The second call with same id, no element found so iterator will point to
map->end().
Because of that map->erase() does not execute.

 I am sorry that I cannot find the ER that you point out.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, November 25, 2016 1:20 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Hi Hoang,

The second invocation of  cpnd_ckpt_sec_del_db() with generate code dump.

=

SectionMap *map(static_cast(cp_node->replica_info.section_db));
 
   if (map) {
 SectionMap::iterator it(map->find(id));
  
 if (it != map->end()) {
   sectionInfo = it->second;
   map->erase(it);
 }
   }
   else {
 LOG_ER("can't find map in cpnd_ckpt_sec_del");
 *osafassert(false);*
   }
=====


-AVM

On 11/25/2016 11:16 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your comment.
>
> Would you please verify again my understanding about this.
>
> Old cpnd_ckpt_sec_del() function and new cpnd_ckpt_sec_del_db() search 
> for the sectionInfo in 2 maps by its id and remove if found.
> In case ckpt_hdr_update_fails, the cpnd_ckpt_sec_del_db() is called 
> twice, the first one remove sectionInfo from maps, the second one does 
> not found sectionInfo and cannot remove anything, will not generate error.
>
> Sincerely,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, November 25, 2016 12:12 PM
> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while 
> section_hdr_update_fail [#2207]
>
> Hi Hoang,
>
> NACK
>
> This patch has some fundamental problem, this will corrupt the cpsv 
> database .
> This patch instead of resolving the issue , increase the percentage of 
> problem occurrence .
>
> Basically this patch splitscpnd_ckpt_sec_del()  in to  two parts
> cpnd_ckpt_sec_del_db() & cpnd_ckpt_sec_del() and the new 
> cpnd_ckpt_sec_del() invokes  cpnd_ckpt_sec_del_db() to fullfil the old 
> function behavior , so that to support  other old invocations and 
> changed the  error cleanup in
> cpnd_ckpt_sec_add() as follows :
>
>ckpt_hdr_update_fails:
>   cpnd_ckpt_sec_del(cp_node, id);
>
>section_hdr_update_fails:
>   cpnd_ckpt_sec_del_db(cp_node, id);
>
> This means   cpnd_ckpt_sec_del_db() is always called twice in case of
> ckpt_hdr_update_fails , which will generate core
>
> -AVM
>
>
> On 11/24/2016 3:26 PM, Hoang Vo wrote:
>>osaf/libs/common/cpsv/include/cpnd_sec.h |   3 +++
>>osaf/services/saf/cpsv/cpnd/cpnd_db.c|   4 +++-
>>osaf/services/saf/cpsv/cpnd/cpnd_sec.cc  |  31
> +++
>>3 files changed, 33 insertions(+), 5 deletions(-)
>>
>>
>> problem:
>> the steps to add a section is add_db_tree -> update_sec_hdr -> 
>> update_ckpt_hdr so if an error occur cpsv should handle error in 
>> reverse
> order.
>> currently, section_hdr_update_fails, cpsv revert ckpt_hdr also that 
>> case error
>>
>> solution:
>> only revert db_tree in case section_hdr_update_fails
>>
>> diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h
>> b/osaf/libs/common/cpsv/include/cpnd_sec.h
>> --- a/osaf/libs/common/cpsv/include/cpnd_sec.h
>> +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
>> @@ -39,6 +39,9 @@ CPND_CKPT_SECTION_INFO *
>>cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const 
>> SaCkptSectionIdT *);
>>
>>CPND_CKPT_SECTION_INFO *
>> +cpnd_ckpt_sec_del_db(CPND_CKPT_NODE *, SaCkptSectionIdT *);
>> +
>> +CPND_CKPT_SECTION_INFO *
>>cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *);
>>
>>CPND_CKPT_SECTION_INFO *
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> @@ -468,10 +468,12 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad
>>  TRACE_LEAVE();
>>  return pSecPtr;
>>
>> - section_hdr_update_fails:
>> ckpt_hdr_update_fails:
>>  cpnd_ckpt_sec_del(cp_node, id);
>>  

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-24 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comment.

Would you please verify again my understanding about this.

Old cpnd_ckpt_sec_del() function and new cpnd_ckpt_sec_del_db() search for
the sectionInfo in 2 maps by its id and remove if found.
In case ckpt_hdr_update_fails, the cpnd_ckpt_sec_del_db() is called twice,
the first one remove sectionInfo from maps, the second one does not found
sectionInfo and cannot remove anything, will not generate error.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, November 25, 2016 12:12 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: fix error handling while
section_hdr_update_fail [#2207]

Hi Hoang,

NACK

This patch has some fundamental problem, this will corrupt the cpsv database
.
This patch instead of resolving the issue , increase the percentage of
problem occurrence .

Basically this patch splitscpnd_ckpt_sec_del()  in to  two parts 
cpnd_ckpt_sec_del_db() & cpnd_ckpt_sec_del() and the new cpnd_ckpt_sec_del()
invokes  cpnd_ckpt_sec_del_db() to fullfil the old function behavior , so
that to support  other old invocations and changed the  error cleanup in
cpnd_ckpt_sec_add() as follows :

  ckpt_hdr_update_fails:
 cpnd_ckpt_sec_del(cp_node, id);

  section_hdr_update_fails:
 cpnd_ckpt_sec_del_db(cp_node, id);

This means   cpnd_ckpt_sec_del_db() is always called twice in case of 
ckpt_hdr_update_fails , which will generate core

-AVM


On 11/24/2016 3:26 PM, Hoang Vo wrote:
>   osaf/libs/common/cpsv/include/cpnd_sec.h |   3 +++
>   osaf/services/saf/cpsv/cpnd/cpnd_db.c|   4 +++-
>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc  |  31
+++
>   3 files changed, 33 insertions(+), 5 deletions(-)
>
>
> problem:
> the steps to add a section is add_db_tree -> update_sec_hdr -> 
> update_ckpt_hdr so if an error occur cpsv should handle error in reverse
order.
> currently, section_hdr_update_fails, cpsv revert ckpt_hdr also that 
> case error
>
> solution:
> only revert db_tree in case section_hdr_update_fails
>
> diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h 
> b/osaf/libs/common/cpsv/include/cpnd_sec.h
> --- a/osaf/libs/common/cpsv/include/cpnd_sec.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
> @@ -39,6 +39,9 @@ CPND_CKPT_SECTION_INFO *
>   cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const 
> SaCkptSectionIdT *);
>   
>   CPND_CKPT_SECTION_INFO *
> +cpnd_ckpt_sec_del_db(CPND_CKPT_NODE *, SaCkptSectionIdT *);
> +
> +CPND_CKPT_SECTION_INFO *
>   cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *);
>   
>   CPND_CKPT_SECTION_INFO *
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_db.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_db.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
> @@ -468,10 +468,12 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad
>   TRACE_LEAVE();
>   return pSecPtr;
>   
> - section_hdr_update_fails:
>ckpt_hdr_update_fails:
>   cpnd_ckpt_sec_del(cp_node, id);
>   
> + section_hdr_update_fails:
> + cpnd_ckpt_sec_del_db(cp_node, id);
> +
>section_add_fails:
>   if (pSecPtr->sec_id.id != NULL)
>   m_MMGR_FREE_CPND_DEFAULT(pSecPtr->sec_id.id);
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc 
> b/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc
> @@ -157,19 +157,19 @@ cpnd_ckpt_sec_find(const CPND_CKPT_NODE
>   }
>   
>
/***
*
> - * Name  : cpnd_ckpt_sec_del
> + * Name  : cpnd_ckpt_sec_del_db
>*
> - * Description   : Function to remove the section from a checkpoint.
> + * Description   : Function to remove the section from a checkpoint map
db.
>*
>* Arguments : CPND_CKPT_NODE *cp_node - Check point node.
>*   : SaCkptSectionIdT id - Section Identifier
> - *
> + *
>* Return Values :  ptr to CPND_CKPT_SECTION_INFO/NULL;
>*
>* Notes : None.
>

*/
>   CPND_CKPT_SECTION_INFO *
> -cpnd_ckpt_sec_del(CPND_CKPT_NODE *cp_node, SaCkptSectionIdT *id)
> +cpnd_ckpt_sec_del_db(CPND_CKPT_NODE *cp_node, SaCkptSectionIdT *id)
>   {
> CPND_CKPT_SECTION_INFO *sectionInfo(0);
>   
> @@ -206,6 +206,29 @@ cpnd_ckpt_sec_del(CPND_CKPT_NODE *cp_nod
>   osafassert(false);
> }
>   
> +  TRACE_LEAVE();
> +
> +  return sectionInfo;
> +}
> +
>
+/**
**
> + * Name  : cpnd_ckpt_sec_del
> + *
> + * Description   : Function to remove the section from a checkpoint.
> + *
> + * Arguments : CPND_CKPT_NODE *cp_node - Check point node.
> + *   : SaCkptSectionIdT id - Section Identifier
> + *
> + * 

Re: [devel] [PATCH 1 of 1] cpsv: on cpnd down fist remove child safReplica object then parent safCkpt object [#2189]

2016-11-17 Thread Vo Minh Hoang
Dear Mahesh,

Reviewed and tested with collocated and non-collocated case, saw problem
fixed and could not find any occurrence.

So ACK from me, tested.

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, November 16, 2016 3:58 PM
To: hoang.m...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] cpsv: on cpnd down fist remove child safReplica
object then parent safCkpt object [#2189]

 osaf/services/saf/cpsv/cpd/cpd_imm.c  |  20 
 osaf/services/saf/cpsv/cpd/cpd_proc.c |   9 -
 2 files changed, 20 insertions(+), 9 deletions(-)


Bug :
While cpd processing cpnd down for  COLLOCATED  cktp  and that checkpoint
only exist on the went down cpnd ( no others Node opened this checkpoint in
cluster) , then cpd  removes  that checkpoint and replica completely.

In such case the current logic has as bug,  fist it removes ckpt node and
then replica, this is causing deletion of parent object safCkpt=...,*  first
, then child object safReplica=...,safCkpt=...,* next.

as we know IMM removes child if parent is removed ,so this is causing the
issue out of sequence remove of safReplica object and ERR_NOT_EXIST  is
returned.

Fix :
While cpd removing  that checkpoint and replica completely , follow the
sequence of  child object safReplica=...,safCkpt=...,*  fist then  parent
object safCkpt=...,* next.

This is focused fix , my be we need to review complete code for such
occurrences , if found will be addressed in new ticket.

diff --git a/osaf/services/saf/cpsv/cpd/cpd_imm.c
b/osaf/services/saf/cpsv/cpd/cpd_imm.c
--- a/osaf/services/saf/cpsv/cpd/cpd_imm.c
+++ b/osaf/services/saf/cpsv/cpd/cpd_imm.c
@@ -400,7 +400,9 @@ SaAisErrorT delete_runtime_replica_objec
osaf_extended_name_lend(replica_dn, _name);
rc = immutil_saImmOiRtObjectDelete(immOiHandle, _name); 
if (rc != SA_AIS_OK) {
-   LOG_ER("Deleting run time object %s Failed - rc =
%d",replica_dn, rc);
+   LOG_ER("Deleting run time object %s Failed-1 - rc =
%d",replica_dn, rc);
+   } else {
+   TRACE("Deleting run time object %s Success-1 - rc =
%d",replica_dn, 
+rc);
}
 
free(replica_dn);
@@ -522,9 +524,11 @@ SaAisErrorT delete_runtime_ckpt_object(C
osaf_extended_name_lend(ckpt_node->ckpt_name, _name);
 
rc =  immutil_saImmOiRtObjectDelete(immOiHandle, _name);
-   if (rc != SA_AIS_OK)
+   if (rc != SA_AIS_OK) {
LOG_ER("Deleting run time object %s failed - rc = %d",
ckpt_node->ckpt_name, rc);
-
+   } else {
+   TRACE("Deleting run time object %s success - rc = %d",
ckpt_node->ckpt_name, rc);
+   }
return rc;
 }
 
@@ -917,11 +921,11 @@ SaAisErrorT cpd_clean_checkpoint_objects
/* Delete the runtime object and its children. */
rc = immutil_saImmOiRtObjectDelete(cb->immOiHandle,
_name);
if (rc == SA_AIS_OK) {
-   TRACE("Object \"%s\" deleted", (char *)
osaf_extended_name_borrow(_name));
-   } else {
-   LOG_ER("%s saImmOiRtObjectDelete for \"%s\" FAILED
%d",
-   __FUNCTION__, (char *)
osaf_extended_name_borrow(_name), rc);
-   }
+  TRACE("saImmOiRtObjectDelete \"%s\" deleted
Successfully", (char *) osaf_extended_name_borrow(_name));
+  } else {
+  LOG_ER("%s saImmOiRtObjectDelete for \"%s\" FAILED
%d",
+  __FUNCTION__, (char *)
osaf_extended_name_borrow(_name), rc);
+  }
}
 
if (rc != SA_AIS_ERR_NOT_EXIST) { diff --git
a/osaf/services/saf/cpsv/cpd/cpd_proc.c
b/osaf/services/saf/cpsv/cpd/cpd_proc.c
--- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
+++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
@@ -809,6 +809,11 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
send_evt.info.cpnd.info.ckpt_del.mds_dest =
*cpnd_dest;
if (ckpt_node->dest_cnt == 0) {
TRACE_1("cpd ckpt del success for
ckpt_id:%llx",ckpt_node->ckpt_id);
+   /* Delete reploc fist*/
+   cpd_ckpt_reploc_get(>ckpt_reploc_tree,
_info, _info);
+   if (rep_info) {
+   cpd_ckpt_reploc_node_delete(cb,
rep_info, ckpt_node->is_unlink_set);
+   }
cpd_ckpt_map_node_get(>ckpt_map_tree,
ckpt_node->ckpt_name, _info);
 
/* Remove the ckpt_node */
@@ -875,7 +880,7 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
/* Send it to CPD(s), by sending ckpt_id = 0 */
/* This is to delete the node from reploc_tree */
cpd_ckpt_reploc_get(>ckpt_reploc_tree, _info,
_info);
-   if 

Re: [devel] [PATCH 1 of 1] fix crash problem by checking null pointer before accessing its detail

2016-11-16 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much.

Might it be like this acceptable, If yes I will send V2 patch.

if (ckpt_node->node_users_cnt) {
int count = 0;
CPD_NODE_USER_INFO *node_user = ckpt_node->node_users;
cpd_msg.info.usr_info_2.node_list =
malloc(ckpt_node->node_users_cnt * sizeof(CPD_NODE_USER_INFO));
memset(cpd_msg.info.usr_info_2.node_list, '\0',
(sizeof(CPD_NODE_USER_INFO) * ckpt_node->node_users_cnt));

for (; node_user != NULL && count <
ckpt_node->node_users_cnt; node_user = node_user->next) {
cpd_msg.info.usr_info_2.node_list[count].dest =
node_user->dest;
cpd_msg.info.usr_info_2.node_list[count].num_users =
node_user->num_users;
cpd_msg.info.usr_info_2.node_list[count].num_readers
= node_user->num_readers;
cpd_msg.info.usr_info_2.node_list[count].num_writers
= node_user->num_writers;
++count;
}

/* Update node_users_cnt in case of mismatch */
ckpt_node->node_users_cnt = count;
cpd_msg.info.usr_info_2.node_users_cnt = count;
}


Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, November 16, 2016 3:29 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] fix crash problem by checking null pointer
before accessing its detail

Hi,

Still we can right the code, I just checked the code,  follow the way
`node_user = node_user->next` is used in other occurrence part of the code.

-AVM

On 11/16/2016 12:16 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your comments.
>
> Because the coredump occur when ckpt_node->node_users_cnt is not 
> updated correctly to the number of node_user so we count here to 
> handle that mismatch.
> Might it possible to keep using like currently?
>
> Sincerely,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Wednesday, November 16, 2016 11:00 AM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>; 
> anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] fix crash problem by checking null pointer 
> before accessing its detail
>
>
> Hi Hoang Vo,
>
>>> Please let me know if you have any further inquiry.
>
> Can you please also make the fix more readable replace  `for (count = 
> 0; count < ckpt_node->node_users_cnt; count++) {` some thing like `for 
> (node_user = ckpt_node->node_users; node_user != NULL; node_user =
> node_user->next) {`
> then, we can removevariable `int count = 0;`,   move
> `cpd_msg.info.usr_info_2.node_users_cnt = ckpt_node->node_users_cnt;` 
> after for () loop.
>
> -AVM
>
>
> On 11/16/2016 9:14 AM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> I am sorry that I cannot share the test steps because I cannot 
>> reproduce it in local environment.
>> I've just received the coredump information point directly to this 
>> part, reviewed source code and found that pointer using is unsafe so 
>> I
> correct it.
>> Please let me know if you have any further inquiry.
>>
>> Thank you and best regard,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Wednesday, November 16, 2016 10:22 AM
>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1 of 1] fix crash problem by checking null 
>> pointer before accessing its detail
>>
>> Hi Hoang Vo,
>>
>> On 11/15/2016 12:57 PM, Hoang Vo wrote:
>>> Testing Commands:
>>> -
>>>
>>>
>>> Testing, Expected Results:
>>> --
>>>
>> Can you please share test case .
>>
>> -AVM
>>
>> On 11/15/2016 12:57 PM, Hoang Vo wrote:
>>> osaf/services/saf/cpsv/cpd/cpd_red.c |  5 +
>>> 1 files changed, 5 insertions(+), 0 deletions(-)
>>>
>>>
>>> diff --git a/osaf/services/saf/cpsv/cpd/cpd_red.c
>> b/osaf/services/saf/cpsv/cpd/cpd_red.c
>>> --- a/osaf/services/saf/cpsv/cpd/cpd_red.c
>>> +++ b/osaf/services/saf/cpsv/cpd/cpd_red.c
>>> @@ -322,6 +322,11 @@ void cpd_a2s_ckpt_usr_info(CPD_CB *cb, C
>>> memset(cpd_msg.info.usr_info_2.node_list, '\0',
>> (sizeof(CPD_NODE_USER_INFO) * ck

Re: [devel] [PATCH 1 of 1] fix crash problem by checking null pointer before accessing its detail

2016-11-15 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comments.

Because the coredump occur when ckpt_node->node_users_cnt is not updated
correctly to the number of node_user so we count here to handle that
mismatch. 
Might it possible to keep using like currently?

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, November 16, 2016 11:00 AM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] fix crash problem by checking null pointer
before accessing its detail


Hi Hoang Vo,

>>Please let me know if you have any further inquiry.


Can you please also make the fix more readable replace  `for (count = 0;
count < ckpt_node->node_users_cnt; count++) {` some thing like `for
(node_user = ckpt_node->node_users; node_user != NULL; node_user =
node_user->next) {`
then, we can removevariable `int count = 0;`,   move
`cpd_msg.info.usr_info_2.node_users_cnt = ckpt_node->node_users_cnt;` after
for () loop.

-AVM


On 11/16/2016 9:14 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> I am sorry that I cannot share the test steps because I cannot 
> reproduce it in local environment.
> I've just received the coredump information point directly to this 
> part, reviewed source code and found that pointer using is unsafe so I
correct it.
>
> Please let me know if you have any further inquiry.
>
> Thank you and best regard,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Wednesday, November 16, 2016 10:22 AM
> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] fix crash problem by checking null pointer 
> before accessing its detail
>
> Hi Hoang Vo,
>
> On 11/15/2016 12:57 PM, Hoang Vo wrote:
>> Testing Commands:
>> -
>>
>>
>> Testing, Expected Results:
>> --
>>
> Can you please share test case .
>
> -AVM
>
> On 11/15/2016 12:57 PM, Hoang Vo wrote:
>>osaf/services/saf/cpsv/cpd/cpd_red.c |  5 +
>>1 files changed, 5 insertions(+), 0 deletions(-)
>>
>>
>> diff --git a/osaf/services/saf/cpsv/cpd/cpd_red.c
> b/osaf/services/saf/cpsv/cpd/cpd_red.c
>> --- a/osaf/services/saf/cpsv/cpd/cpd_red.c
>> +++ b/osaf/services/saf/cpsv/cpd/cpd_red.c
>> @@ -322,6 +322,11 @@ void cpd_a2s_ckpt_usr_info(CPD_CB *cb, C
>>  memset(cpd_msg.info.usr_info_2.node_list, '\0',
> (sizeof(CPD_NODE_USER_INFO) * ckpt_node->node_users_cnt));
>>
>>  for (count = 0; count < ckpt_node->node_users_cnt; count++)
> {
>> +if (node_user == NULL) {
>> +ckpt_node->node_users_cnt = count;
>> +cpd_msg.info.usr_info_2.node_users_cnt =
> count;
>> +break;
>> +}
>>  cpd_msg.info.usr_info_2.node_list[count].dest =
> node_user->dest;
>>  cpd_msg.info.usr_info_2.node_list[count].num_users =
> node_user->num_users;
>>  cpd_msg.info.usr_info_2.node_list[count].num_readers
> = node_user->num_readers;
>
>



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] fix crash problem by checking null pointer before accessing its detail

2016-11-15 Thread Vo Minh Hoang
Dear Mahesh,

I am sorry that I cannot share the test steps because I cannot reproduce it
in local environment.
I've just received the coredump information point directly to this part,
reviewed source code and found that pointer using is unsafe so I correct it.

Please let me know if you have any further inquiry.

Thank you and best regard,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, November 16, 2016 10:22 AM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] fix crash problem by checking null pointer
before accessing its detail

Hi Hoang Vo,

On 11/15/2016 12:57 PM, Hoang Vo wrote:
> Testing Commands:
> -
>
>
> Testing, Expected Results:
> --
>

Can you please share test case .

-AVM

On 11/15/2016 12:57 PM, Hoang Vo wrote:
>   osaf/services/saf/cpsv/cpd/cpd_red.c |  5 +
>   1 files changed, 5 insertions(+), 0 deletions(-)
>
>
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_red.c
b/osaf/services/saf/cpsv/cpd/cpd_red.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_red.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_red.c
> @@ -322,6 +322,11 @@ void cpd_a2s_ckpt_usr_info(CPD_CB *cb, C
>   memset(cpd_msg.info.usr_info_2.node_list, '\0',
(sizeof(CPD_NODE_USER_INFO) * ckpt_node->node_users_cnt));
>   
>   for (count = 0; count < ckpt_node->node_users_cnt; count++)
{
> + if (node_user == NULL) {
> + ckpt_node->node_users_cnt = count;
> + cpd_msg.info.usr_info_2.node_users_cnt =
count;
> + break;
> + }
>   cpd_msg.info.usr_info_2.node_list[count].dest =
node_user->dest;
>   cpd_msg.info.usr_info_2.node_list[count].num_users =
node_user->num_users;
>   cpd_msg.info.usr_info_2.node_list[count].num_readers
= node_user->num_readers;



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108] V2

2016-10-27 Thread Vo Minh Hoang
Dear Mahesh,

I tested with cases:
- Old active with new standby
- Old standby with new active

Each case, create checkpoint, create section, write and read section, close
and unlink.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, October 27, 2016 1:58 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108] V2

Hi Hoang,

Have you tested in-service upgrade case ?

-AVM


On 10/26/2016 2:33 PM, Hoang Vo wrote:
>   osaf/libs/common/cpsv/include/cpsv_shm.h |   28 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_res.c   |  868
--
>   2 files changed, 355 insertions(+), 541 deletions(-)
>
>
> problem: In the case of CKPT osafckptnd increased 3,5Mb - 240 percent 
> on all nodes CKPT_INFO size inscrease when support longDN lead to total
size increase.
>
> solution:
> - From start, cpnd use old format shm.
> - Run time cpnd keep using old format shm until first longDN checkpoint is
created.
> After that cpnd create extended format shm for longDN use.
> - Fix init size for shm.
>
> diff --git a/osaf/libs/common/cpsv/include/cpsv_shm.h 
> b/osaf/libs/common/cpsv/include/cpsv_shm.h
> --- a/osaf/libs/common/cpsv/include/cpsv_shm.h
> +++ b/osaf/libs/common/cpsv/include/cpsv_shm.h
> @@ -27,7 +27,9 @@
>   #define SHM_NEXT -3
>   #define SHM_INIT -1
>   
> -#define CPSV_CPND_SHM_VERSION1
> +#define CPSV_CPND_SHM_VERSION1
> +#define CPSV_CPND_SHM_VERSION_DEPRECATE  2
> +#define CPSV_CPND_SHM_VERSION_EXTENDED   3
>   
>   typedef struct cpsv_ckpt_hdr {
>   SaCkptCheckpointHandleT ckpt_id;/* Index for identifying the
checkpoint */
> @@ -57,7 +59,7 @@ typedef struct cpsv_sect_hdr {
>   } CPSV_SECT_HDR;
>   
>   typedef struct ckpt_info {
> - char ckpt_name[kOsafMaxDnLength];
> + SaNameT ckpt_name;
>   SaCkptCheckpointHandleT ckpt_id;
>   uint32_t maxSections;
>   SaSizeT maxSecSize;
> @@ -74,23 +76,10 @@ typedef struct ckpt_info {
>   int32_t next;
>   } CKPT_INFO;
>   
> -typedef struct ckpt_info_v0 {
> - SaNameT ckpt_name;
> - SaCkptCheckpointHandleT ckpt_id;
> - uint32_t maxSections;
> - SaSizeT maxSecSize;
> - NODE_ID node_id;
> - int32_t offset;
> - uint32_t client_bitmap;
> - int32_t is_valid;
> - uint32_t bm_offset;
> - bool is_unlink;
> - bool is_close;
> - bool cpnd_rep_create;
> - bool is_first;
> - SaTimeT close_time;
> - int32_t next;
> -} CKPT_INFO_V0;
> +typedef struct ckpt_extend_info {
> + char ckpt_name[kOsafMaxDnLength + 1];
> + uint32_t is_valid;
> +} CKPT_EXTENDED_INFO;
>   
>   typedef struct client_info {
>   SaCkptHandleT ckpt_app_hdl;
> @@ -109,6 +98,7 @@ typedef struct gbl_shm_ptr {
>   void *base_addr;
>   void *cli_addr;
>   void *ckpt_addr;
> + void *extended_addr;/* Added in CPSV_CPND_SHM_VERSION_EXTENDED
*/
>   int32_t n_clients;
>   int32_t n_ckpts;
>   } GBL_SHM_PTR;
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_res.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> @@ -40,8 +40,6 @@
>   
>   #define m_CPND_CKPTINFO_READ(ckpt_info,addr,offset) 
> memcpy(_info,addr+offset,sizeof(CKPT_INFO))
>   
> -#define m_CPND_CKPTINFO_V0_READ(ckpt_info,addr,offset) 
> memcpy(_info,addr+offset,sizeof(CKPT_INFO_V0))
> -
>   #define m_CPND_CKPTINFO_UPDATE(addr,ckpt_info,offset) 
> memcpy(addr+offset,_info,sizeof(CKPT_INFO))
>   
>   #define m_CPND_CKPTHDR_UPDATE(ckpt_hdr,offset)  
> memcpy(offset,_hdr,sizeof(CKPT_HDR))
> @@ -50,13 +48,11 @@ static uint32_t cpnd_res_ckpt_sec_add(CP
>   static bool cpnd_find_exact_ckptinfo(CPND_CB *cb, CKPT_INFO *ckpt_info,
uint32_t bitmap_offset,
>uint32_t *offset, uint32_t
*prev_offset);
>   static void cpnd_clear_ckpt_info(CPND_CB *cb, CPND_CKPT_NODE 
> *cp_node, uint32_t curr_offset, uint32_t prev_offset); -static 
> uint32_t cpnd_restore_client_info(CPND_CB *cb, uint8_t *cli_addr); 
> -static uint32_t cpnd_restore_ckpt_info_v1(CPND_CB *cb, uint8_t 
> *ckpt_addr, SaClmNodeIdT nodeid); -static uint32_t 
> cpnd_restore_ckpt_info_v0(CPND_CB *cb, uint8_t *ckpt_addr, 
> SaClmNodeIdT nodeid); -static void 
> cpnd_destroy_shm_cpnd_cp_info(NCS_OS_POSIX_SHM_REQ_OPEN_INFO 
> *open_req); -static void 
> *cpnd_create_shm_cpnd_cp_info(NCS_OS_POSIX_SHM_REQ_INFO *req_info); 
> -static void cpnd_update_shm_cpnd_cp_info(CPND_CB *cb); -static void 
> cpnd_convert_cp_info_v0(CKPT_INFO_V0 *cp_info_v0, CKPT_INFO *cp_info);
> +static void cpnd_destroy_shm(NCS_OS_POSIX_SHM_REQ_OPEN_INFO 
> +*open_req); static uint32_t cpnd_shm_extended_open(CPND_CB *cb, 
> +uint32_t flag); static uint32_t 
> +cpnd_extended_name_lend(SaConstStringT value, SaNameT* name); static 
> 

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108] V2

2016-10-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your help.
Compared to your test app I found my test stop too soon.
After reboot I just check shm existence, did not check to open again.

I will send fix patch soon after carefully test it again.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, October 26, 2016 11:02 AM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108] V2

Hi Hoang,

The attached  `test_#2108_app.c` application will generate cpnd shm open 
request is getting failed case

#gcc test_#2108_app.c -o checkpoint -lSaCkpt

-AVM

On 10/25/2016 12:23 PM, A V Mahesh wrote:
> Hi Hoang,
>
> On 10/25/2016 12:10 PM, Vo Minh Hoang wrote:
>> Would you please tell me the process to reproduce this error?
> I will write standalone application and will share with you .
>
> -AVM
>
> On 10/25/2016 12:10 PM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Thank you very much for your checking.
>> It is very strangle that I tested with 2 following case:
>> - restart nd by kill -9 
>> - restart node by kill -9 
>> Both cases executed well in my local machine.
>>
>> Would you please tell me the process to reproduce this error?
>> It is very strangle that ER is cannot open replica's shm that is not in
>> touch of this patch.
>>
>> Thank you and best regards,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Tuesday, October 25, 2016 12:53 PM
>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
>> length [#2108] V2
>>
>> Hi Hoang,
>>
>> With the patch after CPND restart cpnd shm open request is getting 
>> failed
>>
>> please test CPND restart cases.
>>
>>


>>
>> 
>>
>>  saCkptCheckpointOpen  returned checkpointHandle 626040
>> 222 saCkptCheckpointOpen  returned checkpointHandle 6261f0
>>Before pkill osafckptnd  saCkptCheckpointOpen
>> root 23946 1  0 11:14 ?00:00:00
>> /usr/lib64/opensaf/osafckptnd
>> root 24041 24038  0 11:15 pts/000:00:00 sh -c ps -ef | grep
>> osafckptnd
>> root 24043 24041  0 11:15 pts/000:00:00 grep osafckptnd
>> Oct 25 11:15:07 SC-1 osafckptnd[23946]: exiting for shutdown
>> Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO
>> 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation
>> timer started (timeout: 600 ns)
>> Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO Restarting a component of
>> 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
>> Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO
>> 'safComp=CPND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to
>> 'avaDown' : Recovery is 'componentRestart'
>> Oct 25 11:15:07 SC-1 osafckptd[23989]: NO cpnd_down_process:: Start
>> CPND_RETENTION timer id = 0x663f10, arg=0x664020
>> Oct 25 11:15:07 SC-1 osafckptnd[24058]: Started
>>VV saCkptCheckpointOpen 3rd may hit try again returned 18.
>> 333 saCkptCheckpointOpen  returned checkpointHandle 7f29fbdc7588
>>VV saCkptCheckpointOpen 4th returned may hit try again
>> returned 12.
>> 444 saCkptCheckpointOpen  returned checkpointHandle 7fffb4a097d8
>>saCkptCheckpointOpen 5th returned 12.
>>  saCkptCheckpointOpen  returned checkpointHandle 7f29fbdf61a8
>>Before pkill osafckptnd & saCkptCheckpointClose
>> root 24058 1  0 11:15 ?00:00:00
>> /usr/lib64/opensaf/osafckptnd
>> root 24063 24038  0 11:15 pts/000:00:00 sh -c ps -ef | grep
>> osafckptnd
>> root 24065 24063  0 11:15 pts/000:00:00 grep osafckptnd
>> Oct 25 11:15:19 SC-1 osafckptnd[24058]: exiting for shutdown
>> Oct 25 11:15:19 SC-1 osafamfnd[23844]: NO Restarting a component of
>> 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)
>> Oct 25 11:15:19 SC-1 osafamfnd[23844]: NO
>> 'safComp=CPND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to
>> 'avaDown' : Recovery is 'componentRestart'
>> Oct 25 11:15:19 SC-1 osafckptnd[24080]: Started
>> Oct 25 11:15:19 SC-1 osafckptnd[24080]: ER cpnd shm open request failed
>> safCkpt=checkpoint_tes_131343_1
>> Oct 25 11:15:19 SC-1 osafckptnd[24080]: E

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108] V2

2016-10-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your checking.
It is very strangle that I tested with 2 following case:
- restart nd by kill -9 
- restart node by kill -9 
Both cases executed well in my local machine.

Would you please tell me the process to reproduce this error?
It is very strangle that ER is cannot open replica's shm that is not in
touch of this patch.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, October 25, 2016 12:53 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108] V2

Hi Hoang,

With the patch after CPND restart cpnd shm open request is getting failed

please test CPND restart cases.




 saCkptCheckpointOpen  returned checkpointHandle 626040
222 saCkptCheckpointOpen  returned checkpointHandle 6261f0
  Before pkill osafckptnd  saCkptCheckpointOpen
root 23946 1  0 11:14 ?00:00:00 
/usr/lib64/opensaf/osafckptnd
root 24041 24038  0 11:15 pts/000:00:00 sh -c ps -ef | grep 
osafckptnd
root 24043 24041  0 11:15 pts/000:00:00 grep osafckptnd
Oct 25 11:15:07 SC-1 osafckptnd[23946]: exiting for shutdown
Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation 
timer started (timeout: 600 ns)
Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Oct 25 11:15:07 SC-1 osafamfnd[23844]: NO 
'safComp=CPND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Oct 25 11:15:07 SC-1 osafckptd[23989]: NO cpnd_down_process:: Start 
CPND_RETENTION timer id = 0x663f10, arg=0x664020
Oct 25 11:15:07 SC-1 osafckptnd[24058]: Started
  VV saCkptCheckpointOpen 3rd may hit try again returned 18.
333 saCkptCheckpointOpen  returned checkpointHandle 7f29fbdc7588
  VV saCkptCheckpointOpen 4th returned may hit try again 
returned 12.
444 saCkptCheckpointOpen  returned checkpointHandle 7fffb4a097d8
  saCkptCheckpointOpen 5th returned 12.
 saCkptCheckpointOpen  returned checkpointHandle 7f29fbdf61a8
  Before pkill osafckptnd & saCkptCheckpointClose
root 24058 1  0 11:15 ?00:00:00 
/usr/lib64/opensaf/osafckptnd
root 24063 24038  0 11:15 pts/000:00:00 sh -c ps -ef | grep 
osafckptnd
root 24065 24063  0 11:15 pts/000:00:00 grep osafckptnd
Oct 25 11:15:19 SC-1 osafckptnd[24058]: exiting for shutdown
Oct 25 11:15:19 SC-1 osafamfnd[23844]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)
Oct 25 11:15:19 SC-1 osafamfnd[23844]: NO 
'safComp=CPND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Oct 25 11:15:19 SC-1 osafckptnd[24080]: Started
Oct 25 11:15:19 SC-1 osafckptnd[24080]: ER cpnd shm open request failed 
safCkpt=checkpoint_tes_131343_1
Oct 25 11:15:19 SC-1 osafckptnd[24080]: ER cpnd shm open request failed 
safCkpt=checkpoint_tes_131343_1
 saCkptCheckpointClose  checkpointHandle 626040
Attempt 0-0:  saCkptCheckpointClose returned 12.
222 saCkptCheckpointClose  checkpointHandle 6261f0
Attempt 0-0:  saCkptCheckpointClose returned 12.
333 saCkptCheckpointClose  checkpointHandle 7f29fbdc7588
Attempt 0-0:  saCkptCheckpointClose returned 9.
 saCkptCheckpointClose  checkpointHandle 7fffb4a097d8
Attempt 0-0:  saCkptCheckpointClose returned 9.
555 saCkptCheckpointClose  checkpointHandle 7f29fbdf61a8
Attempt 0-0:  saCkptCheckpointClose returned 9.
 saCkptCheckpointOpen  returned checkpointHandle 626040
222 saCkptCheckpointOpen  returned checkpointHandle 628b40
  Before pkill osafckptnd  saCkptCheckpointOpen
root 24080 1  0 11:15 ?00:00:00 
/usr/lib64/opensaf/osafckptnd
root 24085 24038  0 11:15 pts/000:00:00 sh -c ps -ef | grep 
osafckptnd
root 24087 24085  0 11:15 pts/000:00:00 grep osafckptnd
Oct 25 11:15:26 SC-1 osafckptnd[24080]: exiting for shutdown
Oct 25 11:15:26 SC-1 osafamfnd[23844]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 3)
Oct 25 11:15:26 SC-1 osafamfnd[23844]: NO 
'safComp=CPND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Oct 25 11:15:26 SC-1 osafckptd[23989]: NO cpnd_down_process:: Start 
CPND_RETENTION timer id = 0x663f10, arg=0x664020
Oct 25 11:15:26 SC-1 osafckptnd[24102]: Started
Oct 25 11:15:26 SC-1 osafckptnd[24102]: ER cpnd shm open request failed 
safCkpt=checkpoint_tes_131343_1
Oct 25 11:15:26 SC-1 osafckptnd[24102]: ER cpnd shm open request failed 
safCkpt=checkpoint_tes_131343_1
  VV saCkptCheckpointOpen 3rd may hit try again returned 18.
333 

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108]

2016-10-18 Thread Vo Minh Hoang
Dear Mahesh,

Sorry I miss-sending incomplete email.
This is full version.
--
I would like to send my answer to 2 of your concerning points in compound.

Based on my understand, a client command affects shared mem by following
behavior:

Client --> CPA ==> CPND (1) ==> CPD (active) ==> CPND (has replica) (2)
> update shm (3)

When:
--> Synchronous
==> Asynchronous
(1) and (2) has same behavior to update shm and store pointer to shm
(3) The modification only take place here include swapping shm and update
pointers

So even there are multiple call from multiple client at a time, CPND update
shm in sequence. So just after the first request swaps shm, the second
request could access shm. There is not case that 2 requests access shm at
the same time.
When shm already storing data, in swapping, CPND will update pointer so the
next request that accesses old data can still work with updated pointer with
same behavior.

Thank you and best regards,
Hoang

-Original Message-
From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au] 
Sent: Tuesday, October 18, 2016 2:15 PM
To: 'A V Mahesh' <mahesh.va...@oracle.com>
Cc: 'anders.wid...@ericsson.com' <anders.wid...@ericsson.com>;
'opensaf-devel@lists.sourceforge.net' <opensaf-devel@lists.sourceforge.net>
Subject: RE: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Dear Mahesh,

I would like to send my answer to 2 of your concerning points in compound.

Based on my understand, a client command affects shared mem by following
behavior:

Client --> CPA ==> CPND (1) ==> CPD (active) ==> CPND (has replica) (2)
> update shm (3)

When:
--> 



-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, October 18, 2016 1:10 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Hi Hoang,


On 10/18/2016 11:24 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
>>> [AVM] A non-collated Ckpt will have two replicas on both Active and
> standby.
> Each node will receive one CPND_EVT_D2ND_CKPT_CREATE message so it handles
> swapping itself and does not affect each other nor another.
[AVM]   I was taking about existing, swapping of existing `small format shm`
  not  not new create request , where the ckpt is already opened 
multiple nodes with ALL option.
>
>>> [AVM] piratically  we can have large size data & transit time, if ckt
pat
> has  large data  sham is file I/O operation
>>>  not middle-ware controlled activity , swap time will vary
> depending on system.
> I am agree that this modification affects performance of create/open
> function so it need performance acceptance verification.
> Fortunately, shared mem is on memory so it is not heavily depend on OS or
> file system (unless on swap memory area).
> Maybe I am not understand your ideal here but I have not found a clear
> reason of handling try-again.
[AVM] say for example an application is writing in a loop to old `small 
format shm`,
at that moment you started conversation of old `small format shm` to 
new  `big format`

-AVM

>
> Thank you and best regards,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Tuesday, October 18, 2016 12:14 PM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
> length [#2108]
>
> Hi Hoan,
>
>
> On 10/18/2016 9:59 AM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Thank you very much for your comments.
>>
>> I would like to explain my understanding and reason for this solution.
>> Please correct me if I am wrong.
>>
>> - This memory swapping works on single node alone, it will occur
>> maximum once per node in open/create checkpoint process.
>> - This swapping action just takes place in nodes that meet condition
>> and does not affect other node.
> [AVM] A non-collated Ckpt will have two replicas on both Active and
standby
> .
>> - CPND handles open/create processes atomically in sequence in one
>> thread only.
>>
>> Because of that I think it is unnecessary to implement thread
>> synchronizing or `try-again` handling.
> [AVM] piratically  we can have large size data & transit time, if ckt pat
> has  large data  sham is file I/O operation
>   not middle-ware controlled activity , swap time will vary
> depending on system.
>> Sincerely,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sen

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108]

2016-10-18 Thread Vo Minh Hoang
Dear Mahesh,

I would like to send my answer to 2 of your concerning points in compound.

Based on my understand, a client command affects shared mem by following
behavior:

Client --> CPA ==> CPND (1) ==> CPD (active) ==> CPND (has replica) (2)
> update shm (3)

When:
--> 



-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, October 18, 2016 1:10 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Hi Hoang,


On 10/18/2016 11:24 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
>>> [AVM] A non-collated Ckpt will have two replicas on both Active and
> standby.
> Each node will receive one CPND_EVT_D2ND_CKPT_CREATE message so it handles
> swapping itself and does not affect each other nor another.
[AVM]   I was taking about existing, swapping of existing `small format shm`
  not  not new create request , where the ckpt is already opened 
multiple nodes with ALL option.
>
>>> [AVM] piratically  we can have large size data & transit time, if ckt
pat
> has  large data  sham is file I/O operation
>>>  not middle-ware controlled activity , swap time will vary
> depending on system.
> I am agree that this modification affects performance of create/open
> function so it need performance acceptance verification.
> Fortunately, shared mem is on memory so it is not heavily depend on OS or
> file system (unless on swap memory area).
> Maybe I am not understand your ideal here but I have not found a clear
> reason of handling try-again.
[AVM] say for example an application is writing in a loop to old `small 
format shm`,
at that moment you started conversation of old `small format shm` to 
new  `big format`

-AVM

>
> Thank you and best regards,
> Hoang
>
> -Original Message-----
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Tuesday, October 18, 2016 12:14 PM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
> length [#2108]
>
> Hi Hoan,
>
>
> On 10/18/2016 9:59 AM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Thank you very much for your comments.
>>
>> I would like to explain my understanding and reason for this solution.
>> Please correct me if I am wrong.
>>
>> - This memory swapping works on single node alone, it will occur
>> maximum once per node in open/create checkpoint process.
>> - This swapping action just takes place in nodes that meet condition
>> and does not affect other node.
> [AVM] A non-collated Ckpt will have two replicas on both Active and
standby
> .
>> - CPND handles open/create processes atomically in sequence in one
>> thread only.
>>
>> Because of that I think it is unnecessary to implement thread
>> synchronizing or `try-again` handling.
> [AVM] piratically  we can have large size data & transit time, if ckt pat
> has  large data  sham is file I/O operation
>   not middle-ware controlled activity , swap time will vary
> depending on system.
>> Sincerely,
>> Hoang
>>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Tuesday, October 18, 2016 10:48 AM
>> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
>> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
>> length [#2108]
>>
>> Hi Hoang,
>>
>> On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
>>> No, old checkpoint data is converted to `big format`.
>>> So all of them will be stored in `big format`.
>> [AVM] This approach is introducing NEW transit ,  so far application
>> are aware of  switch-over & fail-over transit and TRY-AGAIN is
>> expected only in those case , now this solution is introducing  a new
>> transit  for the application which are accessioning the old  (by the
>> way this patch didn't implemented TRY-AGAIN when shared memory
>> swapping action occurring)
>>
>> `small format shm`, up on some application creating  `big format` (
>> application impacting the HA behavior )
>> not sure about the solution approach need to discussed !
>>
>> -AVM
>>
>> On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
>>> Dear Mahesh,
>>>
>>> Because of keeping the consistent working behavior of existing
>>> function, only 1 shared memory at a time. If shared me

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108]

2016-10-17 Thread Vo Minh Hoang
Dear Mahesh,

>> [AVM] A non-collated Ckpt will have two replicas on both Active and
standby.
Each node will receive one CPND_EVT_D2ND_CKPT_CREATE message so it handles
swapping itself and does not affect each other nor another.

>> [AVM] piratically  we can have large size data & transit time, if ckt pat
has  large data  sham is file I/O operation
>> not middle-ware controlled activity , swap time will vary
depending on system.
I am agree that this modification affects performance of create/open
function so it need performance acceptance verification.
Fortunately, shared mem is on memory so it is not heavily depend on OS or
file system (unless on swap memory area).
Maybe I am not understand your ideal here but I have not found a clear
reason of handling try-again.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, October 18, 2016 12:14 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Hi Hoan,


On 10/18/2016 9:59 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your comments.
>
> I would like to explain my understanding and reason for this solution.
> Please correct me if I am wrong.
>
> - This memory swapping works on single node alone, it will occur 
> maximum once per node in open/create checkpoint process.
> - This swapping action just takes place in nodes that meet condition 
> and does not affect other node.
[AVM] A non-collated Ckpt will have two replicas on both Active and standby
.
> - CPND handles open/create processes atomically in sequence in one 
> thread only.
>
> Because of that I think it is unnecessary to implement thread 
> synchronizing or `try-again` handling.
[AVM] piratically  we can have large size data & transit time, if ckt pat
has  large data  sham is file I/O operation
 not middle-ware controlled activity , swap time will vary
depending on system.
>
> Sincerely,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Tuesday, October 18, 2016 10:48 AM
> To: Vo Minh Hoang <hoang.m...@dektech.com.au>
> Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name 
> length [#2108]
>
> Hi Hoang,
>
> On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
>> No, old checkpoint data is converted to `big format`.
>> So all of them will be stored in `big format`.
> [AVM] This approach is introducing NEW transit ,  so far application 
> are aware of  switch-over & fail-over transit and TRY-AGAIN is 
> expected only in those case , now this solution is introducing  a new 
> transit  for the application which are accessioning the old  (by the 
> way this patch didn't implemented TRY-AGAIN when shared memory 
> swapping action occurring)
>
> `small format shm`, up on some application creating  `big format` ( 
> application impacting the HA behavior )
>not sure about the solution approach need to discussed !
>
> -AVM
>
> On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
>> Dear Mahesh,
>>
>> Because of keeping the consistent working behavior of existing 
>> function, only 1 shared memory at a time. If shared memory swapping 
>> action occurs, a new shared memory will replace old one.
>>
>> Here is the detailed answers to your questions:
>>>> -The  existing  `small format shm`  will continue to be small , is 
>>>> that
>> right ?
>>>> -Only newly created longDN checkpoint will be in `big format shm`, 
>>>> is
>> that right ?
>> No, old checkpoint data is converted to `big format`.
>> So all of them will be stored in `big format`.
>>
>>>> - what will be the format of newly joined the PL-5 opens  an 
>>>> existing
>> `small format shm`
>> PL-5 still use `small format`.
>> Only when a long DN replica is added in this node, the shared memory 
>> is converted to `big format`.
>>>>the what will be the new replica  on new node `small format shm` 
>>>> or `big
>> format shm` ?
>> This implementation only affect the `header` shared memory 
>> (opensaf_CPND_CHECKPOINT_INFO_nodeid). It do not change replica 
>> shared memory (opensaf_ckptname_nodeid_n).
>>
>> About testing, because of above specification, I tested:
>> - start new node
>> - restart ckptnd with existing small shm
>> - restart ckptnd with existing big shm
>> - create first long dn (check all node)
>>
>> Thank

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108]

2016-10-17 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comments.

I would like to explain my understanding and reason for this solution.
Please correct me if I am wrong.

- This memory swapping works on single node alone, it will occur maximum
once per node in open/create checkpoint process.
- This swapping action just takes place in nodes that meet condition and
does not affect other node.
- CPND handles open/create processes atomically in sequence in one thread
only.

Because of that I think it is unnecessary to implement thread synchronizing
or `try-again` handling.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, October 18, 2016 10:48 AM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: anders.wid...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Hi Hoang,

On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
> No, old checkpoint data is converted to `big format`.
> So all of them will be stored in `big format`.
[AVM] This approach is introducing NEW transit ,  so far application are
aware of  switch-over & fail-over transit and TRY-AGAIN is expected only in
those case , now this solution is introducing  a new transit  for the
application which are accessioning the old  (by the way this patch didn't
implemented TRY-AGAIN when shared memory swapping action occurring)

`small format shm`, up on some application creating  `big format` (
application impacting the HA behavior )
  not sure about the solution approach need to discussed !

-AVM

On 10/13/2016 12:44 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Because of keeping the consistent working behavior of existing 
> function, only 1 shared memory at a time. If shared memory swapping 
> action occurs, a new shared memory will replace old one.
>
> Here is the detailed answers to your questions:
>>> -The  existing  `small format shm`  will continue to be small , is 
>>> that
> right ?
>>> -Only newly created longDN checkpoint will be in `big format shm`, 
>>> is
> that right ?
> No, old checkpoint data is converted to `big format`.
> So all of them will be stored in `big format`.
>
>>> - what will be the format of newly joined the PL-5 opens  an 
>>> existing
> `small format shm`
> PL-5 still use `small format`.
> Only when a long DN replica is added in this node, the shared memory 
> is converted to `big format`.
>>>   the what will be the new replica  on new node `small format shm` 
>>> or `big
> format shm` ?
> This implementation only affect the `header` shared memory 
> (opensaf_CPND_CHECKPOINT_INFO_nodeid). It do not change replica shared 
> memory (opensaf_ckptname_nodeid_n).
>
> About testing, because of above specification, I tested:
> - start new node
> - restart ckptnd with existing small shm
> - restart ckptnd with existing big shm
> - create first long dn (check all node)
>
> Thank you and best regards,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Thursday, October 13, 2016 1:33 PM
> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name 
> length [#2108]
>
> Hi Hoang,
>
>   >> - Run time cpnd keep using small format shm until first longDN 
> checkpoint is created.
>   >> After that cpnd use big format shm.
>
> While reviewing I am assuming following please confirm  :
>
> -The  existing  `small format shm`  will continue to be small , is 
> that right ?
> -Only newly created longDN checkpoint will be in `big format shm`, is 
> that right ?
> - what will be the format of newly joined the PL-5 opens  an existing 
> `small format shm`
> the what will be the new replica  on new node `small format shm` 
> or `big format shm` ?
>
>
> I hope you  tested following :
> ==
> - combination of some `small format shm`  and some  `big format shm`  
> ckpts
> - Joined a New node ( say PL-5)  and then opened the existing `small 
> format shm` ckpt from the new Node
> - Restating controller which has combination of  `small format shm` 
> and `big format shm` and how the restored non-collocated ckpt`s
>
> -AVM
>
> On 10/11/2016 1:15 PM, Hoang Vo wrote:
>>osaf/libs/common/cpsv/include/cpsv_shm.h |9 +-
>>osaf/services/saf/cpsv/cpnd/cpnd_res.c   |  565
> --
>>2 files changed, 536 insertions(+), 38 deletions(-)
>>
>>
>> problem: In the case of CKPT osafckptnd increased 3,5Mb - 240 percent 
>> on all nodes CKPT_INFO size inscre

Re: [devel] [PATCH 1 of 1] cpnd: use shared memory based on ckpt name length [#2108]

2016-10-13 Thread Vo Minh Hoang
Dear Mahesh,

Because of keeping the consistent working behavior of existing function,
only 1 shared memory at a time. If shared memory swapping action occurs, a
new shared memory will replace old one.

Here is the detailed answers to your questions:
>> -The  existing  `small format shm`  will continue to be small , is that
right ?
>> -Only newly created longDN checkpoint will be in `big format shm`, is
that right ? 
No, old checkpoint data is converted to `big format`.
So all of them will be stored in `big format`.

>> - what will be the format of newly joined the PL-5 opens  an existing
`small format shm`
PL-5 still use `small format`.
Only when a long DN replica is added in this node, the shared memory is
converted to `big format`.
>>  the what will be the new replica  on new node `small format shm` or `big
format shm` ?
This implementation only affect the `header` shared memory
(opensaf_CPND_CHECKPOINT_INFO_nodeid). It do not change replica shared
memory (opensaf_ckptname_nodeid_n).

About testing, because of above specification, I tested:
- start new node
- restart ckptnd with existing small shm
- restart ckptnd with existing big shm
- create first long dn (check all node)

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, October 13, 2016 1:33 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpnd: use shared memory based on ckpt name
length [#2108]

Hi Hoang,

 >> - Run time cpnd keep using small format shm until first longDN
checkpoint is created.
 >> After that cpnd use big format shm.

While reviewing I am assuming following please confirm  :

-The  existing  `small format shm`  will continue to be small , is that
right ?
-Only newly created longDN checkpoint will be in `big format shm`, is that
right ?
- what will be the format of newly joined the PL-5 opens  an existing `small
format shm`
   the what will be the new replica  on new node `small format shm` or `big
format shm` ?


I hope you  tested following :
==
- combination of some `small format shm`  and some  `big format shm`  ckpts
- Joined a New node ( say PL-5)  and then opened the existing `small format
shm` ckpt from the new Node
- Restating controller which has combination of  `small format shm` and `big
format shm` and how the restored non-collocated ckpt`s

-AVM

On 10/11/2016 1:15 PM, Hoang Vo wrote:
>   osaf/libs/common/cpsv/include/cpsv_shm.h |9 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_res.c   |  565
--
>   2 files changed, 536 insertions(+), 38 deletions(-)
>
>
> problem: In the case of CKPT osafckptnd increased 3,5Mb - 240 percent 
> on all nodes CKPT_INFO size inscrease when support longDN lead to total
size increase.
>
> solution:
> - From start, cpnd use small format shm.
> - Run time cpnd keep using small format shm until first longDN checkpoint
is created.
> After that cpnd use big format shm.
>
> diff --git a/osaf/libs/common/cpsv/include/cpsv_shm.h 
> b/osaf/libs/common/cpsv/include/cpsv_shm.h
> --- a/osaf/libs/common/cpsv/include/cpsv_shm.h
> +++ b/osaf/libs/common/cpsv/include/cpsv_shm.h
> @@ -27,7 +27,8 @@
>   #define SHM_NEXT -3
>   #define SHM_INIT -1
>   
> -#define CPSV_CPND_SHM_VERSION1
> +#define CPSV_CPND_SHM_VERSION_SHORT_DN   0
> +#define CPSV_CPND_SHM_VERSION_LONG_DN1
>   
>   typedef struct cpsv_ckpt_hdr {
>   SaCkptCheckpointHandleT ckpt_id;/* Index for identifying the
checkpoint */
> @@ -134,4 +135,10 @@ typedef enum cpnd_type_info {
>   CPND_CKPT_INFO
>   } CPND_TYPE_INFO;
>   
> +#define cpsv_cpnd_shm_size(x) x == CPSV_CPND_SHM_VERSION_LONG_DN ?   \
> + sizeof(CLIENT_HDR) + (MAX_CLIENTS * sizeof(CLIENT_INFO)) +
\
> + sizeof(CKPT_HDR) + (MAX_CKPTS * sizeof(CKPT_INFO)) :
\
> + sizeof(CLIENT_HDR) + (MAX_CLIENTS * sizeof(CLIENT_INFO)) +
\
> + sizeof(CKPT_HDR) + (MAX_CKPTS * sizeof(CKPT_INFO_V0))
\
> +
>   #endif
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_res.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_res.c
> @@ -44,20 +44,34 @@
>   
>   #define m_CPND_CKPTINFO_UPDATE(addr,ckpt_info,offset) 
> memcpy(addr+offset,_info,sizeof(CKPT_INFO))
>   
> +#define m_CPND_CKPTINFO_V0_UPDATE(addr,ckpt_info,offset) 
> +memcpy(addr+offset,_info,sizeof(CKPT_INFO_V0))
> +
>   #define m_CPND_CKPTHDR_UPDATE(ckpt_hdr,offset)  
> memcpy(offset,_hdr,sizeof(CKPT_HDR))
>   
> +void *cpnd_restart_shm(NCS_OS_POSIX_SHM_REQ_INFO *cpnd_open_req, 
> +CPND_CB *cb, SaClmNodeIdT nodeid); uint32_t 
> +cpnd_update_ckpt_with_clienthdl_v1(CPND_CB *cb, CPND_CKPT_NODE 
> +*cp_node, SaCkptHandleT client_hdl); uint32_t 
> +cpnd_update_ckpt_with_clienthdl_v0(CPND_CB *cb, CPND_CKPT_NODE 
> +*cp_node, SaCkptHandleT client_hdl); uint32_t 
> +cpnd_write_ckpt_info_v1(CPND_CB *cb, 

Re: [devel] [PATCH 1 of 1] cpsv: remove longDnsAllowed checking each checkpoint creating time [#2068] V2

2016-09-29 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comment.

The function osaf_is_an_extended_name() just check input SaNameT is longDN
or not.
So it do not have any use here.
It is mostly used for handling encode/decode part.

Best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, September 29, 2016 1:44 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] cpsv: remove longDnsAllowed checking
each checkpoint creating time [#2068] V2

Hi Hoang,

It looks  osaf_is_an_extended_name() checks longDnsAllowed other services
are checking that way, so jut keep the code as I suggested in V1 patch.

-AVM

On 9/29/2016 12:08 PM, A V Mahesh wrote:
> ACK not tested
>
> -AVM
>
> On 9/29/2016 11:55 AM, Hoang Vo wrote:
>>osaf/libs/common/cpsv/include/cpnd_evt.h.old |   0
>>osaf/libs/common/cpsv/include/cpnd_init.h|   1 -
>>osaf/services/saf/cpsv/cpd/cpd_db.c  |   3 +++
>>osaf/services/saf/cpsv/cpd/cpd_evt.c |   4 +++-
>>osaf/services/saf/cpsv/cpnd/cpnd_evt.c   |  12 
>>osaf/services/saf/cpsv/cpnd/cpnd_proc.c  |  25
-
>>6 files changed, 6 insertions(+), 39 deletions(-)
>>
>>
>> Problem:
>> Statistically the check point create time for SC and PL (sync and 
>> async) has degradation more than 30% after bring in patch 8004
>>
>> Solution:
>> Remove unnecessary checking that cost time. imm will take the role of 
>> checking
>>
>> diff --git a/osaf/libs/common/cpsv/include/cpnd_evt.h.old 
>> b/osaf/libs/common/cpsv/include/cpnd_evt.h.old
>> deleted file mode 100644
>> diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h 
>> b/osaf/libs/common/cpsv/include/cpnd_init.h
>> --- a/osaf/libs/common/cpsv/include/cpnd_init.h
>> +++ b/osaf/libs/common/cpsv/include/cpnd_init.h
>> @@ -130,7 +130,6 @@ uint32_t cpnd_all_repl_rsp_expiry(CPND_C
>>uint32_t cpnd_open_active_sync_expiry(CPND_CB *cb, CPND_TMR_INFO
*tmr_info);
>>void cpnd_proc_free_read_data(CPSV_EVT *evt);
>>SaUint32T cpnd_get_scAbsenceAllowed_attr(); -SaUint32T 
>> cpnd_get_longDnsAllowed_attr();
>>/* End cpnd_proc.c */
>>
>>/* File : ---  cpnd_amf.c */
>> diff --git a/osaf/services/saf/cpsv/cpd/cpd_db.c 
>> b/osaf/services/saf/cpsv/cpd/cpd_db.c
>> --- a/osaf/services/saf/cpsv/cpd/cpd_db.c
>> +++ b/osaf/services/saf/cpsv/cpd/cpd_db.c
>> @@ -106,6 +106,9 @@ uint32_t cpd_ckpt_node_add(NCS_PATRICIA_
>>  err = create_runtime_ckpt_object(ckpt_node, immOiHandle);
>>  if (err != SA_AIS_OK) {
>>  LOG_ER("create runtime ckpt object failed with
error: %u",err);
>> +if (err == SA_AIS_ERR_INVALID_PARAM) {
>> +return
NCSCC_RC_FAILURE|NCSCC_RC_INVALID_INPUT;
>> +}
>>  return NCSCC_RC_FAILURE;
>>  }
>>  }
>> diff --git a/osaf/services/saf/cpsv/cpd/cpd_evt.c 
>> b/osaf/services/saf/cpsv/cpd/cpd_evt.c
>> --- a/osaf/services/saf/cpsv/cpd/cpd_evt.c
>> +++ b/osaf/services/saf/cpsv/cpd/cpd_evt.c
>> @@ -238,9 +238,11 @@ static uint32_t cpd_evt_proc_ckpt_create
>>  rc = SA_AIS_ERR_NO_MEMORY;
>>  goto send_rsp;
>>  } else if (proc_rc != NCSCC_RC_SUCCESS) {
>> -
>>  TRACE_4("cpd ckpt create failure ckpt name,dest :  %s,
%"PRIu64, ckpt_name, sinfo->dest);
>>  rc = SA_AIS_ERR_LIBRARY;
>> +if (proc_rc_RC_INVALID_INPUT) {
>> +rc = SA_AIS_ERR_INVALID_PARAM;
>> +}
>>  goto send_rsp;
>>  }
>>
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c 
>> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> @@ -605,12 +605,6 @@ static uint32_t cpnd_evt_proc_ckpt_open(
>>  TRACE_ENTER();
>>  memset(_evt, '\0', sizeof(CPSV_EVT));
>>
>> -if ((cpnd_get_longDnsAllowed_attr() == 0) &&
osaf_is_an_extended_name(>info.openReq.ckpt_name)) {
>> -LOG_ER("cpnd - longDnsAllowed == false - NOT supporting
extended name");
>> -send_evt.info.cpa.info.openRsp.error =
SA_AIS_ERR_INVALID_PARAM;
>> -goto agent_rsp;
>> -}
>> -
>>  if (!cpnd_is_cpd_up(cb)) {
>>  send_evt.info.cpa.info.openRsp.error = SA_AIS_ERR_TRY_AGAIN;
>>  goto agent_rsp;
>> @@ -1137,12 +1131,6 @@ static uint32_t cpnd_evt_proc_ckpt_unlin
>>  TRACE_ENTER();
>>  memset(_evt, '\0', sizeof(CPSV_EVT));
>>
>> -if ((cpnd_get_longDnsAllowed_attr() == 0) &&
osaf_is_an_extended_name(>info.ulinkReq.ckpt_name)) {
>> -LOG_ER("cpnd - longDnsAllowed == false - NOT supporting
extended name");
>> -send_evt.info.cpa.info.ulinkRsp.error =
SA_AIS_ERR_INVALID_PARAM;
>> -goto agent_rsp;
>> -}
>> -
>>  if (!cpnd_is_cpd_up(cb)) {
>>  

Re: [devel] [PATCH 1 of 1] cpsv: remove longDnsAllowed checking each checkpoint creating time [#2068]

2016-09-27 Thread Vo Minh Hoang
Dear Mahesh,

osaf_is_an_extended_name() is just a function to check inside SaNameT struct
and does not affect performance.

cpnd_get_longDnsAllowed_attr() check imm config and cost very much
unnecessary time, Imm verify long DN support after that.

Thank you and best regards,
Hoang


-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, September 27, 2016 12:24 PM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: remove longDnsAllowed checking each
checkpoint creating time [#2068]

Hi Hoang ,

On 9/27/2016 10:35 AM, Hoang Vo wrote:
> Solution:
> Remove unnecessary checking that cost time. imm will take the role of 
> checking

-which one  cpnd_get_longDnsAllowed_attr()   or 
osaf_is_an_extended_name()  ?

-I see multiple calls of osaf_is_an_extended_name() in cpnd  are thy out of
IMM context ?  please check

-Is this only case with CPND  ?   ntfsv  , amfnd  is also making this 
calls  why it is effecting only CPD

-AVM

On 9/27/2016 10:35 AM, Hoang Vo wrote:
>   osaf/services/saf/cpsv/cpd/cpd_db.c|   3 +++
>   osaf/services/saf/cpsv/cpd/cpd_evt.c   |   4 +++-
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c |  12 
>   3 files changed, 6 insertions(+), 13 deletions(-)
>
>
> Problem:
> Statistically the check point create time for SC and PL (sync and 
> async) has degradation more than 30% after bring in patch 8004
>
> Solution:
> Remove unnecessary checking that cost time. imm will take the role of 
> checking
>
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_db.c 
> b/osaf/services/saf/cpsv/cpd/cpd_db.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_db.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_db.c
> @@ -106,6 +106,9 @@ uint32_t cpd_ckpt_node_add(NCS_PATRICIA_
>   err = create_runtime_ckpt_object(ckpt_node, immOiHandle);
>   if (err != SA_AIS_OK) {
>   LOG_ER("create runtime ckpt object failed with
error: %u",err);
> + if (err == SA_AIS_ERR_INVALID_PARAM) {
> + return
NCSCC_RC_FAILURE|NCSCC_RC_INVALID_INPUT;
> + }
>   return NCSCC_RC_FAILURE;
>   }
>   }
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_evt.c 
> b/osaf/services/saf/cpsv/cpd/cpd_evt.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_evt.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_evt.c
> @@ -238,9 +238,11 @@ static uint32_t cpd_evt_proc_ckpt_create
>   rc = SA_AIS_ERR_NO_MEMORY;
>   goto send_rsp;
>   } else if (proc_rc != NCSCC_RC_SUCCESS) {
> -
>   TRACE_4("cpd ckpt create failure ckpt name,dest :  %s,
%"PRIu64, ckpt_name, sinfo->dest);
>   rc = SA_AIS_ERR_LIBRARY;
> + if (proc_rc_RC_INVALID_INPUT) {
> + rc = SA_AIS_ERR_INVALID_PARAM;
> + }
>   goto send_rsp;
>   }
>   
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> @@ -605,12 +605,6 @@ static uint32_t cpnd_evt_proc_ckpt_open(
>   TRACE_ENTER();
>   memset(_evt, '\0', sizeof(CPSV_EVT));
>   
> - if ((cpnd_get_longDnsAllowed_attr() == 0) &&
osaf_is_an_extended_name(>info.openReq.ckpt_name)) {
> - LOG_ER("cpnd - longDnsAllowed == false - NOT supporting
extended name");
> - send_evt.info.cpa.info.openRsp.error =
SA_AIS_ERR_INVALID_PARAM;
> - goto agent_rsp;
> - }
> -
>   if (!cpnd_is_cpd_up(cb)) {
>   send_evt.info.cpa.info.openRsp.error = SA_AIS_ERR_TRY_AGAIN;
>   goto agent_rsp;
> @@ -1137,12 +1131,6 @@ static uint32_t cpnd_evt_proc_ckpt_unlin
>   TRACE_ENTER();
>   memset(_evt, '\0', sizeof(CPSV_EVT));
>   
> - if ((cpnd_get_longDnsAllowed_attr() == 0) &&
osaf_is_an_extended_name(>info.ulinkReq.ckpt_name)) {
> - LOG_ER("cpnd - longDnsAllowed == false - NOT supporting
extended name");
> - send_evt.info.cpa.info.ulinkRsp.error =
SA_AIS_ERR_INVALID_PARAM;
> - goto agent_rsp;
> - }
> -
>   if (!cpnd_is_cpd_up(cb)) {
>   send_evt.info.cpa.info.ulinkRsp.error =
SA_AIS_ERR_TRY_AGAIN;
>   goto agent_rsp;



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] cpd: coredump error while creating checkpoint after previous creating got error [#2055]

2016-09-21 Thread Vo Minh Hoang
Dear Mahesh,

The submitted patch correct the behavior of cpd_ckpt_db_entry_update()
function to be similar to cpd_sb_proc_ckpt_create() in handling node_info.
So both 2 cases have been considered.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, September 22, 2016 11:45 AM
To: Hoang Vo ; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpd: coredump error while creating checkpoint
after previous creating got error [#2055]

Hi Hoang,

Please check below cases as well



uint32_t cpd_ckpt_db_entry_update(CPD_CB *cb,
MDS_DEST *cpnd_dest,
CPSV_ND2D_CKPT_CREATE *ckpt_create,
CPD_CKPT_INFO_NODE **o_ckpt_node,
CPD_CKPT_MAP_INFO **io_map_info)

uint32_t cpd_sb_proc_ckpt_create(CPD_CB *cb, CPD_MBCSV_MSG *msg)



-AVM


On 9/21/2016 4:05 PM, Hoang Vo wrote:
>   osaf/services/saf/cpsv/cpd/cpd_proc.c |  5 -
>   1 files changed, 0 insertions(+), 5 deletions(-)
>
>
> Problem:
> First creating time, cpd got error in creating immOm object and run to
error handling steps, this free node_info memory without removing it from
nsc_patricia_tree.
> Second creating time, cpd try to access node_info and got error.
>
> Solution:
> Do not free node_info memory here when this scope does not init it. Only
free mode_info in cpd_cpnd_info_node_delete function.
>
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c 
> b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> @@ -383,11 +383,6 @@ uint32_t cpd_ckpt_db_entry_update(CPD_CB
>   }
>   }
>   
> - if (node_info) {
> - m_MMGR_FREE_CPD_CPND_INFO_NODE(node_info);
> -
> - }
> -
>   TRACE_LEAVE();
>   return proc_rc;
>   



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] imported patch 1967_fix_headless_error_cppcheck.patch

2016-08-24 Thread Vo Minh Hoang
Dear Mahesh,

Logic of this part is complicated so I modified it wrong.
This time I correct it, not a roll back.
That part is subsequence of if clause above.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, August 25, 2016 11:21 AM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] imported patch
1967_fix_headless_error_cppcheck.patch

Hi Hoang,

Is this rollback of  `ckpt: fix cppcheck warning [#1874]` ?

===

@@ -4766,6 +4758,13 @@ static uint32_t cpnd_transfer_replica(CP
   total_num++;
 tmp_sec_info =
cpnd_ckpt_sec_get_next(_node->replica_info, tmp_sec_info);
+if (tmp_sec_info == NULL) {
+rc = NCSCC_RC_FAILURE;
+TRACE_4("cpnd ckpt memory get next allocation failed");
+send_evt.info.cpnd.info.ckpt_nd2nd_sync.data = sec_data;
+ 
cpnd_proc_free_cpsv_ckpt_data(send_evt.info.cpnd.info.ckpt_nd2nd_sync.data);
+return rc;
+}
   }

===

-AVM

On 8/24/2016 7:06 PM, Hoang Vo wrote:
>   osaf/services/saf/cpsv/cpd/cpd_proc.c  |   4 ++--
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c |  11 ++-
>   2 files changed, 4 insertions(+), 11 deletions(-)
>
>
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c 
> b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> @@ -1142,7 +1142,7 @@ void cpd_cb_dump(void)
>   
>
TRACE("--");
>   TRACE(" CKPT ID:  = %d",
(uint32_t)ckpt_node->ckpt_id);
> - TRACE(" CKPT Name len  = %lu",
strlen(ckpt_node->ckpt_name));
> + TRACE(" CKPT Name len  = %zu",
strlen(ckpt_node->ckpt_name));
>   TRACE(" CKPT Name: %s",
ckpt_node->ckpt_name);
>   
>   TRACE(" UNLINK = %d, Active Exists = %d", 
> ckpt_node->is_unlink_set, @@ -1196,7 +1196,7 @@ void cpd_cb_dump(void)
>   name = ckpt_map_node->ckpt_name;
>   
>
TRACE("--");
> - TRACE(" CKPT Name len  = %lu",
strlen(name));
> + TRACE(" CKPT Name len  = %zu",
strlen(name));
>   TRACE(" CKPT Name: %s", name);
>   
>   TRACE(" CKPT ID:  = %d",
(uint32_t)ckpt_map_node->ckpt_id);
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> @@ -4700,8 +4700,8 @@ static uint32_t cpnd_transfer_replica(CP
>   
>   while (1) {
>   
> - if (((size + tmp_sec_info->sec_size) >
MAX_SYNC_TRANSFER_SIZE)
> - || (total_num == cp_node->replica_info.n_secs)) {
> + if ((total_num == cp_node->replica_info.n_secs) ||
> + ((size + tmp_sec_info->sec_size) > MAX_SYNC_TRANSFER_SIZE))
{
>   
>   send_evt.info.cpnd.info.ckpt_nd2nd_sync.num_of_elmts
= num;
>   send_evt.info.cpnd.info.ckpt_nd2nd_sync.data =
sec_data; @@ 
> -4758,13 +4758,6 @@ static uint32_t cpnd_transfer_replica(CP
>   total_num++;
>   
>   tmp_sec_info =
cpnd_ckpt_sec_get_next(_node->replica_info, tmp_sec_info);
> - if (tmp_sec_info == NULL) {
> - rc = NCSCC_RC_FAILURE;
> - TRACE_4("cpnd ckpt memory get next allocation
failed");
> - send_evt.info.cpnd.info.ckpt_nd2nd_sync.data =
sec_data;
> -
cpnd_proc_free_cpsv_ckpt_data(send_evt.info.cpnd.info.ckpt_nd2nd_sync.data);
> - return rc;
> - }
>   }
>   
>   TRACE_LEAVE();



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0 of 8] Review Request for CKPT: Support DNs longer than 255 bytes [#1574] v5

2016-08-23 Thread Vo Minh Hoang
Dear Mahesh,

I updated README file and sent as attachment to this email.
I also sent update patches following your comments.

Please help me push these items if there is no further problem.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Monday, August 22, 2016 1:53 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0 of 8] Review Request for CKPT: Support DNs longer than
255 bytes [#1574] v5

Hi Hoang,

ACK, Please Update README

Note : tested default functionality , LONG DN functionality not tested in
full fledged.

-AVM


On 8/18/2016 12:48 PM, Hoang Vo wrote:
> Summary: CKPT: Support DNs longer than 255 bytes {#1574} Review 
> request for Trac Ticket(s): 1574 Peer Reviewer(s): 
> mahesh.va...@oracle.com; anders.wid...@ericsson.com Pull request to: 
> mahesh.va...@oracle.com Affected branch(es): default Development 
> branch: default
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesy
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
>
> changeset 05233bdae1fb000fea001964eba1c51ebf3bfd8e
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpd: Add support for extended SaNameT [#1574] v3
>
> changeset cecabec5b6be73e731e540fd439e1d0e3534809f
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpnd: Add support for extended SaNameT [#1574] v3
>
> changeset 940dc877c94a9539e3da06d89c6480ef7e0ceda0
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpa: Add support for extended SaNameT [#1574] v1
>
> changeset 1f74531a36163bdfecd6b27174443d51c11ecf61
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpsv: Add new message to support extended SaNameT [#1574] v3
>
> changeset 29df19302186b3275ad06db00dc62f275dea25e1
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpd: Add new mbcsv messages supporting extended SaNameT [#1574] v2
>
> changeset 3f72410a7c2bb077647bdd4e46869a31a832f1d8
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpsv: Apply new messages supporting extended SaNameT to CPD, CPND,
and CPA
>   [#1574] v4
>
> changeset f32a0b3ca1ebf6049d2103e68e91d98bf086c48e
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   ckpt: Add new test cases to verify long DN feature on CPSV [#1574]
v1
>
> changeset 1aa38b707cf2cec14c416631cfc7e5518b25735f
> Author:   Hoang Vo 
> Date: Thu, 18 Aug 2016 13:51:56 +0700
>
>   cpnd: add support for shm recovery for in-service update without
restarting
>   node [#1574] v1
>
>
> Complete diffstat:
> --
>   osaf/libs/agents/saf/cpa/Makefile.am  |1 +
>   osaf/libs/agents/saf/cpa/cpa_api.c|   48 --
>   osaf/libs/agents/saf/cpa/cpa_db.c |2 +
>   osaf/libs/agents/saf/cpa/cpa_mds.c|4 +-
>   osaf/libs/agents/saf/cpa/cpa_proc.c   |2 +-
>   osaf/libs/common/cpsv/cpsv_evt.c  |  440
++---
>   osaf/libs/common/cpsv/include/cpa.h   |1 +
>   osaf/libs/common/cpsv/include/cpa_cb.h|2 +-
>   osaf/libs/common/cpsv/include/cpa_proc.h  |2 +-
>   osaf/libs/common/cpsv/include/cpd.h   |1 +
>   osaf/libs/common/cpsv/include/cpd_cb.h|   17 +-
>   osaf/libs/common/cpsv/include/cpd_imm.h   |4 +-
>   osaf/libs/common/cpsv/include/cpd_mem.h   |   25 +++-
>   osaf/libs/common/cpsv/include/cpd_proc.h  |2 +-
>   osaf/libs/common/cpsv/include/cpnd.h  |1 +
>   osaf/libs/common/cpsv/include/cpnd_cb.h   |5 +-
>   osaf/libs/common/cpsv/include/cpnd_init.h |3 +-
>   osaf/libs/common/cpsv/include/cpsv_evt.h  |   10 +
>   osaf/libs/common/cpsv/include/cpsv_shm.h  |   24 +++-
>   osaf/services/saf/cpsv/cpd/Makefile.am|1 +
>   osaf/services/saf/cpsv/cpd/cpd_amf.c  |7 +-
>   osaf/services/saf/cpsv/cpd/cpd_db.c   |   95 +++---
>   osaf/services/saf/cpsv/cpd/cpd_evt.c  |  103 ++-
>   osaf/services/saf/cpsv/cpd/cpd_imm.c  |  268
++
>   osaf/services/saf/cpsv/cpd/cpd_main.c |7 +
>   osaf/services/saf/cpsv/cpd/cpd_mbcsv.c|   31 -
>   osaf/services/saf/cpsv/cpd/cpd_mds.c  | 

Re: [devel] [PATCH 6 of 8] cpsv: Apply new messages supporting extended SaNameT to CPD, CPND, and CPA [#1574] v4

2016-08-23 Thread Vo Minh Hoang
Dear Mahesh,

I would like to send updated patch following your comment.
When this is minor comment, I send it as attached file.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Monday, August 22, 2016 1:02 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting extended
SaNameT to CPD, CPND, and CPA [#1574] v4

Hi Hoang,

ACK for  [PATCH 6 of 8]  with following minor comment
I think CPND should return SA_AIS_ERR_TOO_BIG = 26 instead of
SA_AIS_ERR_INVALID_PARAM ( please sync-up with other service return values
).

Note : tested default functionality , LONG DN functionality not tested in
full fledged

-AVM


On 8/18/2016 12:48 PM, Hoang Vo wrote:
>   osaf/libs/agents/saf/cpa/cpa_api.c  |  12 
>   osaf/libs/agents/saf/cpa/cpa_mds.c  |   2 +-
>   osaf/libs/common/cpsv/cpsv_evt.c|   1 +
>   osaf/services/saf/cpsv/cpd/cpd_proc.c   |   2 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c  |   2 ++
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c |   2 +-
>   6 files changed, 18 insertions(+), 3 deletions(-)
>
>
> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c 
> b/osaf/libs/agents/saf/cpa/cpa_api.c
> --- a/osaf/libs/agents/saf/cpa/cpa_api.c
> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c
> @@ -880,6 +880,10 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength) {
> + TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
>   
>   /* SA_AIS_ERR_INVALID_PARAM, bullet 4 in SAI-AIS-CKPT-B.02.02
>  Section 3.6.1 saCkptCheckpointOpen() and 
> saCkptCheckpointOpenAsync(), Return Values */ @@ -1192,6 +1196,10 @@
SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength) {
> + TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
>   
>   /* SA_AIS_ERR_INVALID_PARAM, bullet 4 in SAI-AIS-CKPT-B.02.02
>  Section 3.6.1 saCkptCheckpointOpen() and 
> saCkptCheckpointOpenAsync(), Return Values */ @@ -1597,6 +1605,10 @@
SaAisErrorT saCkptCheckpointUnlink(SaCkp
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength) {
> + TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
>   
>   /* retrieve CPA CB */
>   m_CPA_RETRIEVE_CB(cb);
> diff --git a/osaf/libs/agents/saf/cpa/cpa_mds.c 
> b/osaf/libs/agents/saf/cpa/cpa_mds.c
> --- a/osaf/libs/agents/saf/cpa/cpa_mds.c
> +++ b/osaf/libs/agents/saf/cpa/cpa_mds.c
> @@ -515,9 +515,9 @@ static uint32_t cpa_mds_svc_evt(CPA_CB *
>  /* Populate & Send the Open Event to CPND */
>  memset(, 0, sizeof(CPSV_EVT));
>  evt.type = CPSV_EVT_TYPE_CPND;
> -evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_LIST_UPDATE;
>  evt.info.cpnd.info.ckptListUpdate.client_hdl =
lc_node->cl_hdl;
>  osaf_extended_name_lend(lc_node->ckpt_name, 
> _name);
> +evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_LIST_UPDATE;
>   
>  proc_rc = cpa_mds_msg_send(cb->cpa_mds_hdl,
>cpnd_mds_dest, 
> , NCSMDS_SVC_ID_CPND);
>   
> diff --git a/osaf/libs/common/cpsv/cpsv_evt.c 
> b/osaf/libs/common/cpsv/cpsv_evt.c
> --- a/osaf/libs/common/cpsv/cpsv_evt.c
> +++ b/osaf/libs/common/cpsv/cpsv_evt.c
> @@ -2378,6 +2378,7 @@ static uint32_t cpsv_encode_extended_nam
>   if(!osaf_is_an_extended_name(name))
>   return NCSCC_RC_SUCCESS;
>   
> + TRACE("length = %d", name->length);
>   SaConstStringT value = osaf_extended_name_borrow(name);
>   uint16_t length = osaf_extended_name_length(name);
>   
> diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c 
> b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
> +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
> @@ -61,9 +61,9 @@ uint32_t cpd_noncolloc_ckpt_rep_create(C
>   /* Send the Replica create info to CPND */
>   memset(_evt, 0, sizeof(CPSV_EVT));
>   send_evt.type = CPSV_EVT_TYPE_CPND;
> - send_evt.info.cpnd.type = CPND_EVT_D2ND_CKPT_CREATE;
>   
>   osaf_extended_name_lend(map_info->ckpt_name, 
> _evt.info.cpnd.info.ckpt_create.ckpt_name);
> + send_evt.info.cpnd.type = CPND_EVT_D2ND_CKPT_CREATE;
>   
>   d2nd_info = _evt.info.cpnd.info.ckpt_create.ckpt_info;
>   
> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c 
> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
> @@ -4638,6 

Re: [devel] [PATCH 4 of 8] cpsv: Add new message to support extended SaNameT [#1574] v3

2016-08-23 Thread Vo Minh Hoang
Dear Mahesh,

I would like to send updated patch following your comment.
When this is minor comment, I send it as attached file.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Monday, August 22, 2016 12:55 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 4 of 8] cpsv: Add new message to support extended
SaNameT [#1574] v3

Hi Hoang,

ACK for  [PATCH 4 of 8]   assuming unrequited comment code will be removed

Note : tested default functionality , LONG DN functionality not tested in
full fledged

-AVM


On 8/18/2016 12:48 PM, Hoang Vo wrote:
>   osaf/libs/common/cpsv/cpsv_evt.c |  439
+-
>   osaf/libs/common/cpsv/include/cpsv_evt.h |   10 +
>   osaf/services/saf/cpsv/cpd/cpd_mds.c |   84 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_mds.c   |   86 +-
>   4 files changed, 581 insertions(+), 38 deletions(-)
>
>
> diff --git a/osaf/libs/common/cpsv/cpsv_evt.c 
> b/osaf/libs/common/cpsv/cpsv_evt.c
> --- a/osaf/libs/common/cpsv/cpsv_evt.c
> +++ b/osaf/libs/common/cpsv/cpsv_evt.c
> @@ -30,11 +30,14 @@
>   
>   #include "cpsv.h"
>   #include "cpa_tmr.h"
> +#include "osaf_extended_name.h"
>   
>   FUNC_DECLARATION(CPSV_CKPT_DATA);
>   static SaCkptSectionIdT *cpsv_evt_dec_sec_id(NCS_UBAID *i_ub, uint32_t
svc_id);
>   static uint32_t cpsv_evt_enc_sec_id(NCS_UBAID *o_ub, SaCkptSectionIdT
*sec_id);
>   static void cpsv_convert_sec_id_to_string(char *sec_id_str, 
> SaCkptSectionIdT *section_id);
> +static uint32_t cpsv_encode_extended_name_flat(NCS_UBAID *uba, 
> +SaNameT *name); static uint32_t 
> +cpsv_decode_extended_name_flat(NCS_UBAID *uba, SaNameT *name);
>   
>   const char *cpa_evt_str[] = {
>   "STRING_0",
> @@ -254,8 +257,8 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   case CPND_EVT_A2ND_CKPT_OPEN:
>   {
>   CPSV_A2ND_OPEN_REQ *info =
>info.cpnd.info.openReq;
> - snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_OPEN(hdl=%llu, %s)",
> - info->client_hdl, info->ckpt_name.value);
> + snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_OPEN_2(hdl=%llu, %s)",
> + info->client_hdl,
osaf_extended_name_borrow(>ckpt_name));
>   break;
>   }
>   case CPND_EVT_A2ND_CKPT_CLOSE:
> @@ -268,7 +271,7 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   case CPND_EVT_A2ND_CKPT_UNLINK:
>   {
>   CPSV_A2ND_CKPT_UNLINK *info =
>info.cpnd.info.ulinkReq;
> - snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_UNLINK(%s)", info->ckpt_name.value);
> + snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_UNLINK_2(%s)", 
> +osaf_extended_name_borrow(>ckpt_name));
>   break;
>   }
>   case CPND_EVT_A2ND_CKPT_RDSET:
> @@ -513,12 +516,22 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   case CPND_EVT_D2ND_CKPT_CREATE:
>   {
>   CPSV_D2ND_CKPT_CREATE *info =
>info.cpnd.info.ckpt_create;
> - snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_CREATE(%s, create_rep=%s, active=0x%X)",
> - info->ckpt_info.ckpt_id,
info->ckpt_name.value,
> + snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_CREATE_2(%s, create_rep=%s, is_act=%s, active=0x%X,
dest_cnt=%d)",
> + info->ckpt_info.ckpt_id, 
> +osaf_extended_name_borrow(>ckpt_name),
>   info->ckpt_info.ckpt_rep_create ? "true" :
"false",
> -
m_NCS_NODE_ID_FROM_MDS_DEST(info->ckpt_info.active_dest));
> + info->ckpt_info.is_active_exists ? "true" :
"false",
> +
m_NCS_NODE_ID_FROM_MDS_DEST(info->ckpt_info.active_dest),
> + info->ckpt_info.dest_cnt);
> +
> + SaCkptCheckpointCreationAttributesT *attr =
>ckpt_info.attributes;
> + TRACE("mSecS=%lld, flags=%d, mSec=%d, mSecIdS=%lld,
ret=%lld, ckptS=%lld", attr->maxSectionSize,
> + attr->creationFlags, attr->maxSections,
attr->maxSectionIdSize, attr->retentionDuration,
> + attr->checkpointSize);
> + for (int i = 0; i < info->ckpt_info.dest_cnt; i++)
> + TRACE("dest[%d] = 0x%" PRIX64 " ", i, 
> +info->ckpt_info.dest_list[i].dest);
>   break;
>   }
> +
>   case CPND_EVT_D2ND_CKPT_DESTROY:
>   {
>   snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_DESTROY", 
> evt->info.cpnd.info.ckpt_destroy.ckpt_id);
> @@ -608,8 +621,8 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   case CPND_EVT_A2ND_CKPT_LIST_UPDATE:
>   {
>   CPSV_A2ND_CKPT_LIST_UPDATE *info =

Re: [devel] [PATCH 3 of 8] cpa: Add support for extended SaNameT [#1574] v1

2016-08-23 Thread Vo Minh Hoang
Dear Mahesh,

I would like to send updated patch following your comment.
When this is minor comment, I send it as attached file.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Monday, August 22, 2016 12:45 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 3 of 8] cpa: Add support for extended SaNameT [#1574] v1

Hi Hoang,

ACK for  [PATCH 3 of 8]   with following minor comment
I think API should return SA_AIS_ERR_TOO_BIG = 26 instead of
SA_AIS_ERR_INVALID_PARAM ( please sync-up with other service return values
).

Note : tested default functionality , LONG DN functionality not tested in
full fledged

-AVM

On 8/18/2016 12:48 PM, Hoang Vo wrote:
>   osaf/libs/agents/saf/cpa/Makefile.am |   1 +
>   osaf/libs/agents/saf/cpa/cpa_api.c   |  36

>   osaf/libs/agents/saf/cpa/cpa_db.c|   2 +
>   osaf/libs/agents/saf/cpa/cpa_mds.c   |   2 +-
>   osaf/libs/agents/saf/cpa/cpa_proc.c  |   2 +-
>   osaf/libs/common/cpsv/include/cpa.h  |   1 +
>   osaf/libs/common/cpsv/include/cpa_cb.h   |   2 +-
>   osaf/libs/common/cpsv/include/cpa_proc.h |   2 +-
>   8 files changed, 26 insertions(+), 22 deletions(-)
>
>
> diff --git a/osaf/libs/agents/saf/cpa/Makefile.am 
> b/osaf/libs/agents/saf/cpa/Makefile.am
> --- a/osaf/libs/agents/saf/cpa/Makefile.am
> +++ b/osaf/libs/agents/saf/cpa/Makefile.am
> @@ -22,6 +22,7 @@ noinst_LTLIBRARIES = libcpa.la
>   
>   libcpa_la_CPPFLAGS = \
>   -DNCS_CPA=1 \
> + -DSA_EXTENDED_NAME_SOURCE \
>   $(AM_CPPFLAGS) \
>   -I$(top_srcdir)/osaf/libs/common/cpsv/include
>   
> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c 
> b/osaf/libs/agents/saf/cpa/cpa_api.c
> --- a/osaf/libs/agents/saf/cpa/cpa_api.c
> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c
> @@ -870,19 +870,20 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   bool locked = false;
>   uint32_t time_out=0;
>   CPA_GLOBAL_CKPT_NODE *gc_node = NULL;
> + SaConstStringT ckpt_name = NULL;
>   
>   TRACE_ENTER2("SaCkptCheckpointHandleT passed is %llx",ckptHandle);
> - if ((checkpointName == NULL) || (checkpointHandle == NULL) ||
(checkpointName->length == 0)) {
> + if ((checkpointName == NULL) || (checkpointHandle == NULL) || 
> +(osaf_extended_name_length(checkpointName) == 0)) {
>   TRACE_4("Cpa CkptOpen Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", rc);
>   return SA_AIS_ERR_INVALID_PARAM;
>   }
>   
> - m_CPSV_SET_SANAMET(checkpointName);
> + ckpt_name = osaf_extended_name_borrow(checkpointName);
>   
>   /* SA_AIS_ERR_INVALID_PARAM, bullet 4 in SAI-AIS-CKPT-B.02.02
>  Section 3.6.1 saCkptCheckpointOpen() and
saCkptCheckpointOpenAsync(), Return Values */
> -if (strncmp((const char *)checkpointName->value, "safCkpt=", 8)
!= 0) {
> +if (strncmp(ckpt_name, "safCkpt=", 8) != 0) {
>   TRACE_4("Cpa CkptOpen:DN failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", rc);
>   return SA_AIS_ERR_INVALID_PARAM;
[AVM] I think this should return SA_AIS_ERR_TOO_BIG = 26 ( please 
sync-up with other service return values )
> @@ -909,7 +910,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   
>   
>   /* Draft Validations */
> - rc = cpa_open_attr_validate(checkpointCreationAttributes,
checkpointOpenFlags, checkpointName);
> + rc = cpa_open_attr_validate(checkpointCreationAttributes, 
> +checkpointOpenFlags);
>   if (rc != SA_AIS_OK) {
>   /* No need to log, already logged inside the
cpa_open_attr_validate */
>   goto done;
> @@ -965,7 +966,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   lc_node->cl_hdl = ckptHandle;
>   lc_node->open_flags = checkpointOpenFlags;
>   
> - lc_node->ckpt_name = *checkpointName;
> + lc_node->ckpt_name = strdup(ckpt_name);
>   
>   /* Add CPA_LOCAL_CKPT_NODE to lcl_ckpt_hdl_tree */
>   proc_rc = cpa_lcl_ckpt_node_add(>lcl_ckpt_tree, lc_node); @@ 
> -984,7 +985,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> - evt.info.cpnd.info.openReq.ckpt_name = *checkpointName;
> + osaf_extended_name_lend(ckpt_name, 
> +_name);
>   
>   if (checkpointCreationAttributes) {
>   evt.info.cpnd.info.openReq.ckpt_attrib = 
> *checkpointCreationAttributes; @@ -1128,6 +1129,7 @@ gl_node_add_fail:
>   
>lc_node_add_fail:
>   if (lc_node != NULL) {
> + free((void *)lc_node->ckpt_name);
>   m_MMGR_FREE_CPA_LOCAL_CKPT_NODE(lc_node);
>   }
>   
> @@ -1179,6 +1181,7 @@ SaAisErrorT 

Re: [devel] [PATCH 1 of 1] cpa: remove multiple sync_send() calls in case of multiple vector write [#1849]

2016-08-21 Thread Vo Minh Hoang
Dear Mahesh,

I would like to send my ideal about this.
Please consider it.

- I'm agree this modification will enhance writing performance.
- I checked opensaf-4.7.x both newest changeset (7886) and tested changeset
(7640) and found the same source code. So I think that this fixing will not
solve the problem of performance degradation from 4.7.x to 5.0.x.

Thank you and best regards,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, August 18, 2016 4:10 PM
To: nhat.p...@dektech.com.au; hoang.m...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] cpa: remove multiple sync_send() calls in case of
multiple vector write [#1849]

 osaf/libs/agents/saf/cpa/cpa_api.c |  17 -
 1 files changed, 8 insertions(+), 9 deletions(-)


Issue :
In current saCkptCheckpointWrite() , in case of multiple ioVector  write
case  ( > 1 ) multiple calls made for cpa_mds_msg_sync_send() , which is
cause  performance issue .

Fix :
It is not required to call cpa_mds_msg_sync_send() multiple time , so fixed
to call only once.

diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c
b/osaf/libs/agents/saf/cpa/cpa_api.c
--- a/osaf/libs/agents/saf/cpa/cpa_api.c
+++ b/osaf/libs/agents/saf/cpa/cpa_api.c
@@ -3523,17 +3523,16 @@ SaAisErrorT saCkptCheckpointWrite(SaCkpt
/* Unlock cpa_lock before calling mds api */
m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
 
-   for (iter = 0; iter < numberOfElements; iter++) {
+   for (iter = 0; iter < numberOfElements; iter++)
all_ioVector_size += ioVector[iter].dataSize;
-   time_out = CPA_WAIT_TIME(all_ioVector_size);
- 
-   if (time_out < CPSV_WAIT_TIME) {
-   time_out = CPSV_WAIT_TIME;
-   }
-   proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl,
&(gc_node->active_mds_dest),
+   
+   time_out = CPA_WAIT_TIME(all_ioVector_size);
+   if (time_out < CPSV_WAIT_TIME) {
+   time_out = CPSV_WAIT_TIME;
+   }
+   
+   proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, 
+&(gc_node->active_mds_dest),
, _evt, time_out);
-
-   }
/* Generate rc from proc_rc */
switch (proc_rc)
{


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 6 of 8] cpsv: Apply new messages supporting extended SaNameT to CPD, CPND, and CPA v1 [#1574]

2016-08-16 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your clarifying.
When other service did also, I will update and provide patch that don't use
new messages.

Thank you and best regards,
Hoang


-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, August 16, 2016 11:29 AM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting extended
SaNameT to CPD, CPND, and CPA v1 [#1574]

Hi Hoang,

Please find my responses as [AVM].

-AVM

On 8/15/2016 8:16 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your review.
> I would like to answer to your concern.
>
> The solution of adding new messages instead of updating old one mostly 
> because Long DN function has been working in current system (other 
> modules already support long DN).
[AVM]  Long DN  is being introduced in this release in all services
opensaf-5.1.x , so Opensaf rule is that any newly introduce
enhancement/feature functionality can be used once Cluster is completely
upgraded to new version ( all nodes with opensaf-5.1.x )  , so while
upgrading (in-service ) the  SA_ENABLE_EXTENDED_NAMES  will be false , so
their is no way that other modules already support long DN.
So their is not possibility  of  Creating New Long DN  checkpoint while
Upgrading.

>   So if we do not handle this, there are cases that CKPT is in-service 
> update in Long DN enabling nodes and run with Long DN function 
> immediately (Old checkpoint still short DN when previous CKPT version 
> does not support Long DN, just other service) and new updated CKPT 
> will send same message with different format to old CKPT cause 
> strangle behavior.
[AVM]  Accommodating existing checkpoint with Short DN  on newly
upgrading/upgraded  cluster checkpoint code  logic will be the aim of this
path ( please remember while upgrading SA_ENABLE_EXTENDED_NAMES  will be
false). The new logic Long DN variables is higher and will all ways
accommodate Short DN existing checkpoint information.

This patch case we don't need to handle message that will come  LONG DN ,
this path only need to take care of accommodating the existing short DN name
checkpoint names in a new data structures locally.

General comment : Even for each case of supporting servicemen upgrade, if we
were go on adding new event , by this time
opensaf might be having thousands of new event ,you can find some  
old patches of that handle in-service upgrade with out adding new event.

-AVM

> Please consider about it.
>
> Thank you and best regards,
> Hoang
>
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, August 12, 2016 12:49 PM
> To: Hoang Vo <hoang.m...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting 
> extended SaNameT to CPD, CPND, and CPA v1 [#1574]
>
> Hi Hoang,
>
> We don't required any new event like CPND_EVT_D2ND_CKPT_CREATE_2 even 
> though NEW CPND might might be sending event to OLD CPD.
>
> In general Opensaf rule is that any newly introduce 
> enhancement/feature functionality can be used once Cluster is 
> completely upgraded to new version.So while upgrading (in-service ) 
> the  SA_ENABLE_EXTENDED_NAMES  will be false  ( longDnsAllowed=0 of 
> opensafImm=opensafImm,safApp=safImmService)
> , once clusters upgraded to new version then only ongDnsAllowed set to 
> 1, ("immcfg -o safImmService -m 
> opensafImm=opensafImm,safApp=safImmService -a
> longDnsAllowed=1") so remove new event merge the content in to a 
> single exist old event , and do additionl functonality  based on the 
> state of
> longDnsAllowed=1 or 0
>
> -AVM
>
> On 7/21/2016 3:04 PM, Hoang Vo wrote:
>>osaf/libs/agents/saf/cpa/cpa_api.c  |  12 +---
>>osaf/libs/agents/saf/cpa/cpa_mds.c  |   2 +-
>>osaf/libs/common/cpsv/cpsv_evt.c|   1 +
>>osaf/services/saf/cpsv/cpd/cpd_evt.c|  10 +++---
>>osaf/services/saf/cpsv/cpd/cpd_mds.c|   3 ++-
>>osaf/services/saf/cpsv/cpd/cpd_proc.c   |   2 +-
>>osaf/services/saf/cpsv/cpnd/cpnd_evt.c  |  26
--
>>osaf/services/saf/cpsv/cpnd/cpnd_mds.c  |   4 ++--
>>osaf/services/saf/cpsv/cpnd/cpnd_proc.c |   6 +++---
>>9 files changed, 42 insertions(+), 24 deletions(-)
>>
>>
>> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c
> b/osaf/libs/agents/saf/cpa/cpa_api.c
>> --- a/osaf/libs/agents/saf/cpa/cpa_api.c
>> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c
>> @@ -880,6 +880,8 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>>  }
>>
>>  ckpt_name = osaf_extended_name_borrow(checkpoint

Re: [devel] [PATCH 1 of 1] cpsv: To update checkpoint user number for each node [#1669] V4

2016-08-12 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your help.
I send the attached patch that fix missing in encode/decode function.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, August 12, 2016 10:15 AM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for each
node [#1669] V4

I will test for you send the patch.

-AVM


On 8/11/2016 3:38 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Would you please tell me the case that produce this error?
> I review source code and found that encode/decode function missed 1 
> attribute.
> But running test in our environment could not reproduce this problem.
>
> Thank you and best regards,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Tuesday, August 9, 2016 4:35 PM
> To: Hoang Vo <hoang.m...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for 
> each node [#1669] V4
>
> Hi Hoang ,
>
> Please hold on pushing.
>
> On new node we have see some issue please check CPD enode and decode 
> once ( new patch node ) .
>
> Aug  9 14:23:01 SC-1 osafckptd[20478]: hj_enc.c:311:
> decode_flatten_space: Assertion 'p8' failed.
> Aug  9 14:23:01 SC-1 osafamfnd[20439]: NO 
> 'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown'
:
> Recovery is 'nodeFailfast'
>
> -AVM
>
>
>
> On 8/9/2016 2:09 PM, A V Mahesh wrote:
>> ACK,
>>
>> -AVM
>>
>>
>> On 8/3/2016 4:02 PM, Hoang Vo wrote:
>>>osaf/libs/common/cpsv/include/cpd_cb.h |2 +
>>>osaf/libs/common/cpsv/include/cpd_proc.h |3 +
>>>osaf/libs/common/cpsv/include/cpd_red.h  |   13 ++
>>>osaf/libs/common/cpsv/include/cpsv_evt.h |8 +
>>>osaf/services/saf/cpsv/cpd/cpd_db.c  |   14 ++-
>>>osaf/services/saf/cpsv/cpd/cpd_evt.c |8 +
>>>osaf/services/saf/cpsv/cpd/cpd_mbcsv.c   |   96 ---
>>>osaf/services/saf/cpsv/cpd/cpd_proc.c|  148
>>> +++
>>>osaf/services/saf/cpsv/cpd/cpd_red.c |   30 -
>>>osaf/services/saf/cpsv/cpd/cpd_sbevt.c   |   57 +--
>>>10 files changed, 344 insertions(+), 35 deletions(-)
>>>
>>>
>>> Problem:
>>> ---
>>> The saCkptCheckpointNumOpeners is not updated when a node which has 
>>> a checkpoint client restarts.
>>>
>>> Solution:
>>> 
>>> Currently CPD doesn't store number of user on each node. This patch 
>>> updates CPD to update information about users on each node for each 
>>> checkpoint. When a node restarts, the CPD update the total number of 
>>> users for a checkpoint accordingly. This is reflected on 
>>> saCkptCheckpointNumOpeners attribute correctly.
>>>
>>> diff --git a/osaf/libs/common/cpsv/include/cpd_cb.h
>>> b/osaf/libs/common/cpsv/include/cpd_cb.h
>>> --- a/osaf/libs/common/cpsv/include/cpd_cb.h
>>> +++ b/osaf/libs/common/cpsv/include/cpd_cb.h
>>> @@ -92,6 +92,8 @@ typedef struct cpd_ckpt_info_node {
>>>uint32_t num_users;
>>>uint32_t num_readers;
>>>uint32_t num_writers;
>>> +uint32_t node_users_cnt;
>>> +CPD_NODE_USER_INFO *node_users;
>>>  /* for imm */
>>>SaUint32T ckpt_used_size;
>>> diff --git a/osaf/libs/common/cpsv/include/cpd_proc.h
>>> b/osaf/libs/common/cpsv/include/cpd_proc.h
>>> --- a/osaf/libs/common/cpsv/include/cpd_proc.h
>>> +++ b/osaf/libs/common/cpsv/include/cpd_proc.h
>>> @@ -108,5 +108,8 @@ uint32_t cpd_mbcsv_enc_async_update(CPD_
>>>uint32_t cpd_mbcsv_close(CPD_CB *cb);
>>>bool cpd_is_noncollocated_replica_present_on_payload(CPD_CB *cb, 
>>> CPD_CKPT_INFO_NODE *ckpt_node);
>>>uint32_t cpd_ckpt_reploc_imm_object_delete(CPD_CB *cb, 
>>> CPD_CKPT_REPLOC_INFO *ckpt_reploc_node ,bool is_unlink_set);
>>> +void cpd_proc_increase_node_user_info(CPD_CKPT_INFO_NODE 
>>> +*ckpt_node,
>>> MDS_DEST cpnd_dest, SaCkptCheckpointOpenFlagsT open_flags);
>>> +void cpd_proc_decrease_node_user_info(CPD_CKPT_INFO_NODE 
>>> +*ckpt_node,
>>> MDS_DEST cpnd_dest, SaCkptCheckpointOpenFlagsT open_flags);
>>> +void cpd_proc_update_user_info_when_node_down(CPD_CB *cb, NODE_ID
>>> node_id);
>>>uint32_t cpd_proc_ckpt_update_po

Re: [devel] [PATCH 1 of 1] cpsv: To update checkpoint user number for each node [#1669] V4

2016-08-11 Thread Vo Minh Hoang
Dear Mahesh,

Would you please tell me the case that produce this error?
I review source code and found that encode/decode function missed 1
attribute.
But running test in our environment could not reproduce this problem.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, August 9, 2016 4:35 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for each
node [#1669] V4

Hi Hoang ,

Please hold on pushing.

On new node we have see some issue please check CPD enode and decode once (
new patch node ) .

Aug  9 14:23:01 SC-1 osafckptd[20478]: hj_enc.c:311: 
decode_flatten_space: Assertion 'p8' failed.
Aug  9 14:23:01 SC-1 osafamfnd[20439]: NO
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'

-AVM



On 8/9/2016 2:09 PM, A V Mahesh wrote:
> ACK,
>
> -AVM
>
>
> On 8/3/2016 4:02 PM, Hoang Vo wrote:
>>   osaf/libs/common/cpsv/include/cpd_cb.h |2 +
>>   osaf/libs/common/cpsv/include/cpd_proc.h |3 +
>>   osaf/libs/common/cpsv/include/cpd_red.h  |   13 ++
>>   osaf/libs/common/cpsv/include/cpsv_evt.h |8 +
>>   osaf/services/saf/cpsv/cpd/cpd_db.c  |   14 ++-
>>   osaf/services/saf/cpsv/cpd/cpd_evt.c |8 +
>>   osaf/services/saf/cpsv/cpd/cpd_mbcsv.c   |   96 ---
>>   osaf/services/saf/cpsv/cpd/cpd_proc.c|  148 
>> +++
>>   osaf/services/saf/cpsv/cpd/cpd_red.c |   30 -
>>   osaf/services/saf/cpsv/cpd/cpd_sbevt.c   |   57 +--
>>   10 files changed, 344 insertions(+), 35 deletions(-)
>>
>>
>> Problem:
>> ---
>> The saCkptCheckpointNumOpeners is not updated when a node which has a 
>> checkpoint client restarts.
>>
>> Solution:
>> 
>> Currently CPD doesn't store number of user on each node. This patch 
>> updates CPD to update information about users on each node for each 
>> checkpoint. When a node restarts, the CPD update the total number of 
>> users for a checkpoint accordingly. This is reflected on 
>> saCkptCheckpointNumOpeners attribute correctly.
>>
>> diff --git a/osaf/libs/common/cpsv/include/cpd_cb.h
>> b/osaf/libs/common/cpsv/include/cpd_cb.h
>> --- a/osaf/libs/common/cpsv/include/cpd_cb.h
>> +++ b/osaf/libs/common/cpsv/include/cpd_cb.h
>> @@ -92,6 +92,8 @@ typedef struct cpd_ckpt_info_node {
>>   uint32_t num_users;
>>   uint32_t num_readers;
>>   uint32_t num_writers;
>> +uint32_t node_users_cnt;
>> +CPD_NODE_USER_INFO *node_users;
>> /* for imm */
>>   SaUint32T ckpt_used_size;
>> diff --git a/osaf/libs/common/cpsv/include/cpd_proc.h
>> b/osaf/libs/common/cpsv/include/cpd_proc.h
>> --- a/osaf/libs/common/cpsv/include/cpd_proc.h
>> +++ b/osaf/libs/common/cpsv/include/cpd_proc.h
>> @@ -108,5 +108,8 @@ uint32_t cpd_mbcsv_enc_async_update(CPD_
>>   uint32_t cpd_mbcsv_close(CPD_CB *cb);
>>   bool cpd_is_noncollocated_replica_present_on_payload(CPD_CB *cb, 
>> CPD_CKPT_INFO_NODE *ckpt_node);
>>   uint32_t cpd_ckpt_reploc_imm_object_delete(CPD_CB *cb, 
>> CPD_CKPT_REPLOC_INFO *ckpt_reploc_node ,bool is_unlink_set);
>> +void cpd_proc_increase_node_user_info(CPD_CKPT_INFO_NODE *ckpt_node,
>> MDS_DEST cpnd_dest, SaCkptCheckpointOpenFlagsT open_flags);
>> +void cpd_proc_decrease_node_user_info(CPD_CKPT_INFO_NODE *ckpt_node,
>> MDS_DEST cpnd_dest, SaCkptCheckpointOpenFlagsT open_flags);
>> +void cpd_proc_update_user_info_when_node_down(CPD_CB *cb, NODE_ID
>> node_id);
>>   uint32_t cpd_proc_ckpt_update_post(CPD_CB *cb);
>>   #endif
>> diff --git a/osaf/libs/common/cpsv/include/cpd_red.h
>> b/osaf/libs/common/cpsv/include/cpd_red.h
>> --- a/osaf/libs/common/cpsv/include/cpd_red.h
>> +++ b/osaf/libs/common/cpsv/include/cpd_red.h
>> @@ -64,6 +64,18 @@ typedef struct cpd_a2s_ckpt_usr_info {
>> } CPD_A2S_CKPT_USR_INFO;
>>   +typedef struct cpd_a2s_ckpt_usr_info_2 {
>> +SaCkptCheckpointHandleT ckpt_id;
>> +uint32_t num_user;
>> +uint32_t num_writer;
>> +uint32_t num_reader;
>> +uint32_t num_sections;
>> +uint32_t ckpt_on_scxb1;
>> +uint32_t ckpt_on_scxb2;
>> +uint32_t node_users_cnt;
>> +CPD_NODE_USER_INFO *node_list;
>> +} CPD_A2S_CKPT_USR_INFO_2;
>> +
>>   typedef struct cpd_mbcsv_msg {
>>   CPD_MBCSV_MSG_TYPE type;
>>   union {
>> @@ -76,6 +88,7 @@ typedef struct cpd_mbcsv_msg {
>>   CPD_A2S_CKPT_UNLINK ckpt_ulink;
>>   CPD_A2S_CKPT_USR_INFO usr_info;
>>   CPSV_CKPT_DEST_INFO dest_down;
>> +CPD_A2S_CKPT_USR_INFO_2 usr_info_2;
>>   } info;
>>   } CPD_MBCSV_MSG;
>>   diff --git a/osaf/libs/common/cpsv/include/cpsv_evt.h
>> b/osaf/libs/common/cpsv/include/cpsv_evt.h
>> --- a/osaf/libs/common/cpsv/include/cpsv_evt.h
>> +++ b/osaf/libs/common/cpsv/include/cpsv_evt.h
>> @@ -840,6 +840,14 @@ typedef struct cpd_tmr_info {
>>   } info;
>>   } CPD_TMR_INFO;
>>   +typedef struct 

Re: [devel] [PATCH 4 of 8] cpsv: Add new message to support extended SaNameT v1 [#1574]

2016-08-10 Thread Vo Minh Hoang
Dear Mahesh,

Encode and decode for new messages are implemented in separate functions
like *_encode() and *_decode() so they do not exist in *_edu.c file.

I find only MDS_CLIENT_MSG_FORMAT_VER and I think that I should not update
this each time adding new message. I am sorry if it is wrong.

Thank you very much for your review.
Best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, August 10, 2016 6:37 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 4 of 8] cpsv: Add new message to support extended
SaNameT v1 [#1574]

Hi Hoang,

I am seeing new encode decodes done between  D to ND ,  in this patch  ( old
cpd to new cpnd  & old cpnd to new cpd ) but not able find edu rules ,
please confirm.

Otherwise it will take unnecessary  testing effort.

If not handled please refer `osaf/libs/common/cpsv/cpsv_edu.c ` file for
MSG_FORMAT_VER handling or any old patches

-AVM


On 8/2/2016 2:14 PM, Hoang Vo wrote:
>   osaf/libs/common/cpsv/cpsv_evt.c |  504
++-
>   osaf/libs/common/cpsv/include/cpsv_evt.h |   24 +
>   osaf/services/saf/cpsv/cpd/cpd_mds.c |   84 -
>   osaf/services/saf/cpsv/cpnd/cpnd_mds.c   |   84 -
>   4 files changed, 668 insertions(+), 28 deletions(-)
>
>
> New messages supporting extended SaNameT are introduce. Encoding and
decoding funtions for them are also included.
>
> diff --git a/osaf/libs/common/cpsv/cpsv_evt.c 
> b/osaf/libs/common/cpsv/cpsv_evt.c
> --- a/osaf/libs/common/cpsv/cpsv_evt.c
> +++ b/osaf/libs/common/cpsv/cpsv_evt.c
> @@ -30,11 +30,14 @@
>   
>   #include "cpsv.h"
>   #include "cpa_tmr.h"
> +#include "osaf_extended_name.h"
>   
>   FUNC_DECLARATION(CPSV_CKPT_DATA);
>   static SaCkptSectionIdT *cpsv_evt_dec_sec_id(NCS_UBAID *i_ub, uint32_t
svc_id);
>   static uint32_t cpsv_evt_enc_sec_id(NCS_UBAID *o_ub, SaCkptSectionIdT
*sec_id);
>   static void cpsv_convert_sec_id_to_string(char *sec_id_str, 
> SaCkptSectionIdT *section_id);
> +static uint32_t cpsv_encode_extended_name_flat(NCS_UBAID *uba, 
> +SaNameT *name); static uint32_t 
> +cpsv_decode_extended_name_flat(NCS_UBAID *uba, SaNameT *name);
>   
>   const char *cpa_evt_str[] = {
>   "STRING_0",
> @@ -258,6 +261,13 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   info->client_hdl, info->ckpt_name.value);
>   break;
>   }
> + case CPND_EVT_A2ND_CKPT_OPEN_2:
> + {
> + CPSV_A2ND_OPEN_REQ *info =
>info.cpnd.info.openReq;
> + snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_OPEN_2(hdl=%llu, %s)",
> + info->client_hdl,
osaf_extended_name_borrow(>ckpt_name));
> + break;
> + }
>   case CPND_EVT_A2ND_CKPT_CLOSE:
>   {
>   CPSV_A2ND_CKPT_CLOSE *info =
>info.cpnd.info.closeReq; @@ 
> -271,6 +281,12 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_UNLINK(%s)", info->ckpt_name.value);
>   break;
>   }
> + case CPND_EVT_A2ND_CKPT_UNLINK_2:
> + {
> + CPSV_A2ND_CKPT_UNLINK *info =
>info.cpnd.info.ulinkReq;
> + snprintf(o_evt_str, len,
"CPND_EVT_A2ND_CKPT_UNLINK_2(%s)",
osaf_extended_name_borrow(>ckpt_name));
> + break;
> + }
>   case CPND_EVT_A2ND_CKPT_RDSET:
>   {
>   CPSV_A2ND_RDSET *info =
>info.cpnd.info.rdsetReq; @@ -513,12 
> +529,33 @@ char* cpsv_evt_str(CPSV_EVT *evt, char *
>   case CPND_EVT_D2ND_CKPT_CREATE:
>   {
>   CPSV_D2ND_CKPT_CREATE *info =
>info.cpnd.info.ckpt_create;
> - snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_CREATE(%s, create_rep=%s, active=0x%X)",
> - info->ckpt_info.ckpt_id,
info->ckpt_name.value,
> + snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_CREATE(%s, create_rep=%s, is_act=%s, active=0x%X,
dest_cnt=%d)",
> + info->ckpt_info.ckpt_id, 
> +osaf_extended_name_borrow(>ckpt_name),
>   info->ckpt_info.ckpt_rep_create ? "true" :
"false",
> -
m_NCS_NODE_ID_FROM_MDS_DEST(info->ckpt_info.active_dest));
> + info->ckpt_info.is_active_exists ? "true" :
"false",
> +
m_NCS_NODE_ID_FROM_MDS_DEST(info->ckpt_info.active_dest),
> + info->ckpt_info.dest_cnt);
>   break;
>   }
> + case CPND_EVT_D2ND_CKPT_CREATE_2:
> + {
> + CPSV_D2ND_CKPT_CREATE *info =
>info.cpnd.info.ckpt_create;
> + snprintf(o_evt_str, len, "[%llu]
CPND_EVT_D2ND_CKPT_CREATE_2(%s, create_rep=%s, is_act=%s, active=0x%X,

Re: [devel] [PATCH 1 of 1] cpsv: To update checkpoint user number for each node [#1669] V3

2016-08-03 Thread Vo Minh Hoang
Dear Mahesh,

I have just submit a V4 patch that try to eliminate the possible error in
communicating between old and new version.

My testing shows OK result but when I cannot reproduce the problem exactly,
I do not have high confident about it.

Would you please help me review and check the result of this patch.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, July 27, 2016 4:53 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; 'Nhat Pham'
<nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for each
node [#1669] V3

Hi  Hoang,

I still able to reproduce the problem , some time it increments two time of
current readers ,

some time it is getting  decremented to less then zero ( variable are set
(0xfff6) )

Unfortunately I don't have any specific steps order , but this issue occurs
in cluster setup with  1new controller & 1 old controller  and  2 old
payloads

when tow  application opened & holded  on old payloads ( don't exist) , and
try to do fail-overs of controllers and then exit the applications on both
payloads,

you will end up with error.

I broad , I suggest you look at the new messages that are getting introduced
in this patch are prevented with version check


===

PL-3:~ # immlist safCkpt=checkpoint_test77
Name   Type Value(s)

safCkptSA_STRING_T 
safCkpt=checkpoint_test77
saCkptCheckpointUsedSize   SA_UINT64_T 110 (0x6e)
saCkptCheckpointSize   SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointRetDurationSA_TIME_T 
9223372036854775807 (0x7fff, Sat Apr 12 05:17:16 2262)
saCkptCheckpointNumWriters SA_UINT32_T 
4294967286 (0xfff6)
saCkptCheckpointNumSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointNumReplicasSA_UINT32_T  2 (0x2)
saCkptCheckpointNumReaders SA_UINT32_T 
4294967286 (0xfff6)
saCkptCheckpointNumOpeners SA_UINT32_T  0 (0x0)
saCkptCheckpointNumCorruptSections SA_UINT32_T  0 (0x0)
saCkptCheckpointMaxSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointMaxSectionSize SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointMaxSectionIdSize   SA_UINT64_T 256 (0x100)
*saCkptCheckpointCreationTimestamp  SA_TIME_T 
14696097540 (0x14651a48f19e8400, Wed Jul 27 14:25:54 2016)*
saCkptCheckpointCreationFlags  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName   SA_STRING_T 
safCheckPointService
SaImmAttrClassName SA_STRING_T 
SaCkptCheckpoint
SaImmAttrAdminOwnerNameSA_STRING_T 



SC-2:~ # immlist safCkpt=checkpoint_test77
Name   Type Value(s)

safCkptSA_STRING_T 
safCkpt=checkpoint_test77
saCkptCheckpointUsedSize   SA_UINT64_T 110 (0x6e)
saCkptCheckpointSize   SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointRetDurationSA_TIME_T 
9223372036854775807 (0x7fff, Sat Apr 12 05:17:16 2262)
saCkptCheckpointNumWriters SA_UINT32_T  20 (0x14)
saCkptCheckpointNumSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointNumReplicasSA_UINT32_T  2 (0x2)
saCkptCheckpointNumReaders SA_UINT32_T  20 (0x14)
saCkptCheckpointNumOpeners SA_UINT32_T  20 (0x14)
saCkptCheckpointNumCorruptSections SA_UINT32_T  0 (0x0)
saCkptCheckpointMaxSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointMaxSectionSize SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointMaxSectionIdSize   SA_UINT64_T 256 (0x100)
*saCkptCheckpointCreationTimestamp  SA_TIME_T 
14696106140 (0x14651b112d9d1c00, Wed Jul 27 14:40:14 2016)*
saCkptCheckpointCreationFlags  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName   SA_STRING_T 
safCheckPointService
SaImmAttrClassName SA_STRING_T 
SaCkptCheckpoint
SaImmAttrAdminOwnerNameSA_STRING_T 


===

-AV

Re: [devel] [PATCH 6 of 8] cpsv: Apply new messages supporting extended SaNameT to CPD, CPND, and CPA v1 [#1574]

2016-08-02 Thread Vo Minh Hoang
Dear Mahesh,

I have just submitted the V3 patch fixing source code following your
comment.
I also update readme file that is attached to this email.
Please tell me if you have any further inquiry.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, July 29, 2016 1:27 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting extended
SaNameT to CPD, CPND, and CPA v1 [#1574]

Hi ,

I have proved some code related minor comment in 4 of 8 , 6 of 8 & 8 of
8 , please  address them also in new patch(s).

Reading testing :

On 7/29/2016 7:49 AM, Vo Minh Hoang wrote:
> We do not have specific test case and environment for in-service update.
> So my test work is based on osaftest cases.
>
> I tested with old d - new nd and new d - old nd cases.

I started basic testing with  one new Controller  & one old Controller and
tow Old payload setup.
I observed some encode/decode issue  by running simple  ` /usr/bin/ckpttest`

Even I was not able to proceed with other testing which not related to LONG
DN)( continuously receiving SA_AIS_ERR_TRY_AGAIN) Please re-visit the all
encode & decode function that you have introduced run  the basic testing
with the above setup running application on ( old nodes/application  and new
nodes/application )

traces On Old Node ( with out patch )




Jul 29 11:34:16 SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29
11:34:16 SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:26
SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:27 SC-2
osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:27 SC-2
osafckptnd[5233]: ER cpnd mds decode failed


==

Jul 29 11:34:04.216980 osafckptnd [5233:cpnd_mds.c:0225] >>
cpnd_mds_callback Jul 29 11:34:04.217086 osafckptnd [5233:cpnd_mds.c:0267]
<< cpnd_mds_callback Jul 29 11:34:04.217100 osafckptnd
[5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul 29 11:34:04.217129
osafckptnd [5233:cpnd_mds.c:0729] << cpnd_mds_rcv Jul 29 11:34:04.217175
osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback Jul 29
11:34:04.217240 osafckptnd [5233:cpsv_evt.c:2219] TR cpnd <<== [9]
CPSV_EVT_ND2ND_CKPT_SECT_CREATE_REQ(sec_id=0x0B15) from node 0x2010F Jul 29
11:34:04.217254 osafckptnd [5233:cpnd_evt.c:2626] >>
cpnd_evt_proc_nd2nd_ckpt_sect_create
Jul 29 11:34:04.217264 osafckptnd [5233:cpnd_evt.c:2632] T4 cpnd ckpt node
get failed for ckpt_id:9 Jul 29 11:34:04.217272 osafckptnd
[5233:cpnd_mds.c:1004] >> cpnd_mds_send_rsp Jul 29 11:34:04.217282
osafckptnd [5233:cpsv_evt.c:2213] TR cpnd ==>> [0]
CPSV_EVT_ND2ND_CKPT_SECT_ACTIVE_CREATE_RSP(err=6) to node 0x2010F Jul 29
11:34:04.217301 osafckptnd [5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul
29 11:34:04.217310 osafckptnd [5233:cpnd_mds.c:0291] >> cpnd_mds_enc Jul 29
11:34:04.217331 osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback Jul
29 11:34:04.217464 osafckptnd [5233:cpnd_mds.c:1028] << cpnd_mds_send_rsp
Jul 29 11:34:04.217483 osafckptnd [5233:cpnd_evt.c:2740] <<
cpnd_evt_proc_nd2nd_ckpt_sect_create
Jul 29 11:34:04.217493 osafckptnd [5233:cpnd_evt.c:4433] >> cpnd_evt_destroy
Jul 29 11:34:04.217503 osafckptnd [5233:cpnd_evt.c:4619] << cpnd_evt_destroy
Jul 29 11:34:04.719653 osafckptnd [5233:cpnd_mds.c:0225] >>
cpnd_mds_callback Jul 29 11:34:04.719758 osafckptnd [5233:cpnd_mds.c:0267]
<< cpnd_mds_callback Jul 29 11:34:04.719772 osafckptnd
[5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul 29 11:34:04.719801
osafckptnd [5233:cpnd_mds.c:0729] << cpnd_mds_rcv Jul 29 11:34:04.719812
osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback


=

520|0 88 03703659 1 10|
520|0 88 03703659 1 11| SUCCESS   : ckpt with all flags on created
520|0 88 03703659 1 12| Return Value  : SA_AIS_OK
520|0 88 03703659 1 13|
520|0 88 03703659 1 14| SUCCESS   : Unlinked ckpt all replica
520|0 88 03703659 1 15| Return Value  : SA_AIS_OK
520|0 88 03703659 1 16|
520|0 88 03703659 1 17| FAILED: Section 11 created in ckpt
520|0 88 03703659 1 18| Return Value  : SA_AIS_ERR_TRY_AGAIN


=

-AVM


On 7/29/2016 7:49 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> We do not have specific test case and environment for in-service update.
> So my test work is based on osaftest cases.
>
> I tested with old d - new nd and new d - old nd cases.
> There are failed test cases but after investigating, they are 
> i

Re: [devel] [PATCH 6 of 8] cpsv: Apply new messages supporting extended SaNameT to CPD, CPND, and CPA v1 [#1574]

2016-07-29 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your comment.
I start working on it from now.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, July 29, 2016 1:27 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting extended
SaNameT to CPD, CPND, and CPA v1 [#1574]

Hi ,

I have proved some code related minor comment in 4 of 8 , 6 of 8 & 8 of
8 , please  address them also in new patch(s).

Reading testing :

On 7/29/2016 7:49 AM, Vo Minh Hoang wrote:
> We do not have specific test case and environment for in-service update.
> So my test work is based on osaftest cases.
>
> I tested with old d - new nd and new d - old nd cases.

I started basic testing with  one new Controller  & one old Controller and
tow Old payload setup.
I observed some encode/decode issue  by running simple  ` /usr/bin/ckpttest`

Even I was not able to proceed with other testing which not related to LONG
DN)( continuously receiving SA_AIS_ERR_TRY_AGAIN) Please re-visit the all
encode & decode function that you have introduced run  the basic testing
with the above setup running application on ( old nodes/application  and new
nodes/application )

traces On Old Node ( with out patch )




Jul 29 11:34:16 SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29
11:34:16 SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:26
SC-2 osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:27 SC-2
osafckptnd[5233]: ER cpnd mds decode failed Jul 29 11:34:27 SC-2
osafckptnd[5233]: ER cpnd mds decode failed


==

Jul 29 11:34:04.216980 osafckptnd [5233:cpnd_mds.c:0225] >>
cpnd_mds_callback Jul 29 11:34:04.217086 osafckptnd [5233:cpnd_mds.c:0267]
<< cpnd_mds_callback Jul 29 11:34:04.217100 osafckptnd
[5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul 29 11:34:04.217129
osafckptnd [5233:cpnd_mds.c:0729] << cpnd_mds_rcv Jul 29 11:34:04.217175
osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback Jul 29
11:34:04.217240 osafckptnd [5233:cpsv_evt.c:2219] TR cpnd <<== [9]
CPSV_EVT_ND2ND_CKPT_SECT_CREATE_REQ(sec_id=0x0B15) from node 0x2010F Jul 29
11:34:04.217254 osafckptnd [5233:cpnd_evt.c:2626] >>
cpnd_evt_proc_nd2nd_ckpt_sect_create
Jul 29 11:34:04.217264 osafckptnd [5233:cpnd_evt.c:2632] T4 cpnd ckpt node
get failed for ckpt_id:9 Jul 29 11:34:04.217272 osafckptnd
[5233:cpnd_mds.c:1004] >> cpnd_mds_send_rsp Jul 29 11:34:04.217282
osafckptnd [5233:cpsv_evt.c:2213] TR cpnd ==>> [0]
CPSV_EVT_ND2ND_CKPT_SECT_ACTIVE_CREATE_RSP(err=6) to node 0x2010F Jul 29
11:34:04.217301 osafckptnd [5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul
29 11:34:04.217310 osafckptnd [5233:cpnd_mds.c:0291] >> cpnd_mds_enc Jul 29
11:34:04.217331 osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback Jul
29 11:34:04.217464 osafckptnd [5233:cpnd_mds.c:1028] << cpnd_mds_send_rsp
Jul 29 11:34:04.217483 osafckptnd [5233:cpnd_evt.c:2740] <<
cpnd_evt_proc_nd2nd_ckpt_sect_create
Jul 29 11:34:04.217493 osafckptnd [5233:cpnd_evt.c:4433] >> cpnd_evt_destroy
Jul 29 11:34:04.217503 osafckptnd [5233:cpnd_evt.c:4619] << cpnd_evt_destroy
Jul 29 11:34:04.719653 osafckptnd [5233:cpnd_mds.c:0225] >>
cpnd_mds_callback Jul 29 11:34:04.719758 osafckptnd [5233:cpnd_mds.c:0267]
<< cpnd_mds_callback Jul 29 11:34:04.719772 osafckptnd
[5233:cpnd_mds.c:0225] >> cpnd_mds_callback Jul 29 11:34:04.719801
osafckptnd [5233:cpnd_mds.c:0729] << cpnd_mds_rcv Jul 29 11:34:04.719812
osafckptnd [5233:cpnd_mds.c:0267] << cpnd_mds_callback


=

520|0 88 03703659 1 10|
520|0 88 03703659 1 11| SUCCESS   : ckpt with all flags on created
520|0 88 03703659 1 12| Return Value  : SA_AIS_OK
520|0 88 03703659 1 13|
520|0 88 03703659 1 14| SUCCESS   : Unlinked ckpt all replica
520|0 88 03703659 1 15| Return Value  : SA_AIS_OK
520|0 88 03703659 1 16|
520|0 88 03703659 1 17| FAILED: Section 11 created in ckpt
520|0 88 03703659 1 18| Return Value  : SA_AIS_ERR_TRY_AGAIN


=

-AVM


On 7/29/2016 7:49 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> We do not have specific test case and environment for in-service update.
> So my test work is based on osaftest cases.
>
> I tested with old d - new nd and new d - old nd cases.
> There are failed test cases but after investigating, they are 
> intensional when old one cannot recognize new message.
> The only problem found is record to ticket #1922 and it is not related 
&

Re: [devel] [PATCH 6 of 8] cpsv: Apply new messages supporting extended SaNameT to CPD, CPND, and CPA v1 [#1574]

2016-07-28 Thread Vo Minh Hoang
Dear Mahesh,

We do not have specific test case and environment for in-service update.
So my test work is based on osaftest cases.

I tested with old d - new nd and new d - old nd cases.
There are failed test cases but after investigating, they are intensional
when old one cannot recognize new message.
The only problem found is record to ticket #1922 and it is not related to
long DN problem.

Thank you and best regards,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Thursday, July 28, 2016 4:04 PM
To: Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 6 of 8] cpsv: Apply new messages supporting extended
SaNameT to CPD, CPND, and CPA v1 [#1574]

Hi Hoang,

I just started reviewing please share following :

I hope this Long DN should support In-service upgrade ,  if so,

please share the test case that you run , it will help me in reviewing

and  in test uncovered  use/test cases.

-AVM


On 7/21/2016 3:04 PM, Hoang Vo wrote:
>   osaf/libs/agents/saf/cpa/cpa_api.c  |  12 +---
>   osaf/libs/agents/saf/cpa/cpa_mds.c  |   2 +-
>   osaf/libs/common/cpsv/cpsv_evt.c|   1 +
>   osaf/services/saf/cpsv/cpd/cpd_evt.c|  10 +++---
>   osaf/services/saf/cpsv/cpd/cpd_mds.c|   3 ++-
>   osaf/services/saf/cpsv/cpd/cpd_proc.c   |   2 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c  |  26 --
>   osaf/services/saf/cpsv/cpnd/cpnd_mds.c  |   4 ++--
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c |   6 +++---
>   9 files changed, 42 insertions(+), 24 deletions(-)
>
>
> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c 
> b/osaf/libs/agents/saf/cpa/cpa_api.c
> --- a/osaf/libs/agents/saf/cpa/cpa_api.c
> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c
> @@ -880,6 +880,8 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength)
> + return SA_AIS_ERR_INVALID_PARAM;
>   
>   /* SA_AIS_ERR_INVALID_PARAM, bullet 4 in SAI-AIS-CKPT-B.02.02
>  Section 3.6.1 saCkptCheckpointOpen() and 
> saCkptCheckpointOpenAsync(), Return Values */ @@ -981,7 +983,7 @@
SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   /* Populate & Send the Open Event to CPND */
>   memset(, 0, sizeof(CPSV_EVT));
>   evt.type = CPSV_EVT_TYPE_CPND;
> - evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_OPEN;
> + evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_OPEN_2;
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> @@ -1192,6 +1194,8 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength)
> + return SA_AIS_ERR_INVALID_PARAM;
>   
>   /* SA_AIS_ERR_INVALID_PARAM, bullet 4 in SAI-AIS-CKPT-B.02.02
>  Section 3.6.1 saCkptCheckpointOpen() and 
> saCkptCheckpointOpenAsync(), Return Values */ @@ -1277,7 +1281,7 @@
SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   /* Populate & Send the Open Event to CPND */
>   memset(, 0, sizeof(CPSV_EVT));
>   evt.type = CPSV_EVT_TYPE_CPND;
> - evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_OPEN;
> + evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_OPEN_2;
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> @@ -1597,6 +1601,8 @@ SaAisErrorT saCkptCheckpointUnlink(SaCkp
>   }
>   
>   ckpt_name = osaf_extended_name_borrow(checkpointName);
> + if (strlen(ckpt_name) >= kOsafMaxDnLength)
> + return SA_AIS_ERR_INVALID_PARAM;
>   
>   /* retrieve CPA CB */
>   m_CPA_RETRIEVE_CB(cb);
> @@ -1635,7 +1641,7 @@ SaAisErrorT saCkptCheckpointUnlink(SaCkp
>   /* Populate evt.info.cpnd.info.unlinkReq & Call MDS sync Send */
>   memset(, 0, sizeof(CPSV_EVT));
>   evt.type = CPSV_EVT_TYPE_CPND;
> - evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_UNLINK;
> + evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_UNLINK_2;
>   
>   osaf_extended_name_lend(ckpt_name, 
> _name);
>   
> diff --git a/osaf/libs/agents/saf/cpa/cpa_mds.c 
> b/osaf/libs/agents/saf/cpa/cpa_mds.c
> --- a/osaf/libs/agents/saf/cpa/cpa_mds.c
> +++ b/osaf/libs/agents/saf/cpa/cpa_mds.c
> @@ -515,7 +515,7 @@ static uint32_t cpa_mds_svc_evt(CPA_CB *
>  /* Populate & Send the Open Event to CPND */
>  memset(, 0, sizeof(CPSV_EVT));
>  evt.type = CPSV_EVT_TYPE_CPND;
> -evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_LIST_UPDATE;
> +evt.info.cpnd.type = CPND_EVT_A2ND_CKPT_LIST_UPDATE_2;
>  evt.info.cpnd.info.ckptListUpdate.client_hdl =
lc_node->cl_hdl;
>  osaf_extended_name_lend(lc_node->ckpt_name, 
> _name);
>   
> diff --git a/osaf/libs/common/cpsv/cpsv_evt.c 
> 

Re: [devel] [PATCH 1 of 1] cpsv: To update checkpoint user number for each node [#1669] V3

2016-07-27 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your provided information.
I will continue investigating this problem.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, July 27, 2016 4:53 PM
To: Vo Minh Hoang <hoang.m...@dektech.com.au>; 'Nhat Pham'
<nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for each
node [#1669] V3

Hi  Hoang,

I still able to reproduce the problem , some time it increments two time of
current readers ,

some time it is getting  decremented to less then zero ( variable are set
(0xfff6) )

Unfortunately I don't have any specific steps order , but this issue occurs
in cluster setup with  1new controller & 1 old controller  and  2 old
payloads

when tow  application opened & holded  on old payloads ( don't exist) , and
try to do fail-overs of controllers and then exit the applications on both
payloads,

you will end up with error.

I broad , I suggest you look at the new messages that are getting introduced
in this patch are prevented with version check


===

PL-3:~ # immlist safCkpt=checkpoint_test77
Name   Type Value(s)

safCkptSA_STRING_T 
safCkpt=checkpoint_test77
saCkptCheckpointUsedSize   SA_UINT64_T 110 (0x6e)
saCkptCheckpointSize   SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointRetDurationSA_TIME_T 
9223372036854775807 (0x7fff, Sat Apr 12 05:17:16 2262)
saCkptCheckpointNumWriters SA_UINT32_T 
4294967286 (0xfff6)
saCkptCheckpointNumSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointNumReplicasSA_UINT32_T  2 (0x2)
saCkptCheckpointNumReaders SA_UINT32_T 
4294967286 (0xfff6)
saCkptCheckpointNumOpeners SA_UINT32_T  0 (0x0)
saCkptCheckpointNumCorruptSections SA_UINT32_T  0 (0x0)
saCkptCheckpointMaxSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointMaxSectionSize SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointMaxSectionIdSize   SA_UINT64_T 256 (0x100)
*saCkptCheckpointCreationTimestamp  SA_TIME_T 
14696097540 (0x14651a48f19e8400, Wed Jul 27 14:25:54 2016)*
saCkptCheckpointCreationFlags  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName   SA_STRING_T 
safCheckPointService
SaImmAttrClassName SA_STRING_T 
SaCkptCheckpoint
SaImmAttrAdminOwnerNameSA_STRING_T 



SC-2:~ # immlist safCkpt=checkpoint_test77
Name   Type Value(s)

safCkptSA_STRING_T 
safCkpt=checkpoint_test77
saCkptCheckpointUsedSize   SA_UINT64_T 110 (0x6e)
saCkptCheckpointSize   SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointRetDurationSA_TIME_T 
9223372036854775807 (0x7fff, Sat Apr 12 05:17:16 2262)
saCkptCheckpointNumWriters SA_UINT32_T  20 (0x14)
saCkptCheckpointNumSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointNumReplicasSA_UINT32_T  2 (0x2)
saCkptCheckpointNumReaders SA_UINT32_T  20 (0x14)
saCkptCheckpointNumOpeners SA_UINT32_T  20 (0x14)
saCkptCheckpointNumCorruptSections SA_UINT32_T  0 (0x0)
saCkptCheckpointMaxSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointMaxSectionSize SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointMaxSectionIdSize   SA_UINT64_T 256 (0x100)
*saCkptCheckpointCreationTimestamp  SA_TIME_T 
14696106140 (0x14651b112d9d1c00, Wed Jul 27 14:40:14 2016)*
saCkptCheckpointCreationFlags  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName   SA_STRING_T 
safCheckPointService
SaImmAttrClassName SA_STRING_T 
SaCkptCheckpoint
SaImmAttrAdminOwnerNameSA_STRING_T 


===

-AVM

On 7/26/2016 8:41 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Thank you very much for your checking.
>
> Unfortunately, I unsuccessfully reproduce this problem in our environment.
> Would you please sen

Re: [devel] [PATCH 1 of 1] cpsv: To update checkpoint user number for each node [#1669] V3

2016-07-25 Thread Vo Minh Hoang
Dear Mahesh,

Thank you very much for your checking.

Unfortunately, I unsuccessfully reproduce this problem in our environment.
Would you please send us the trace log of d and nd of both SC-1 and SC-2
when error occur for investigating.

For reference, here is my reproduce steps:
1. prepare SC-1 with patch, SC-2 without patch
2. create checkpoint in SC-1
3. open checkpoint in SC-2
4. immlist to get checkpoint information
5. unlink and close checkpoint in SC-1
6. immlist again to confirm its deletion
7. create checkpoint again in SC-1
8. list all replica in sharemem, there is a different here, in you error
log, why sharemem is different between SC-1 and SC-2? In my opinion sharemem
should be one.
9. immlist to check information

Please tell us if I miss something.
I am sorry for any inconvenient.

Thank you and best regards.
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, July 15, 2016 10:26 AM
To: Nhat Pham ; anders.wid...@ericsson.com; Nhat
Pham ; Hoang Vo 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] cpsv: To update checkpoint user number for each
node [#1669] V3

Hi  Hoang /Nhat Pham,


The basic testing with in-service upgrade(one old  controller with 
out patch and one new  controller with patch ) is corrupting the
Writers/Readers/Openers  DB,

please verify in-service upgrade test with collocated & no-collocated ckpts
and address new issue and publish V4 patch.

SC-1:/avm/opensaf_app/cpsv_applications/virtualaddr # immlist
safCkpt=checkpoint_test77
Name   Type Value(s)

safCkptSA_STRING_T 
safCkpt=checkpoint_test77
saCkptCheckpointUsedSize   SA_UINT64_T 110 (0x6e)
saCkptCheckpointSize   SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointRetDurationSA_TIME_T 
9223372036854775807 (0x7fff, Sat Apr 12 05:17:16 2262)
saCkptCheckpointNumWriters SA_UINT32_T 
4294967291 (0xfffb)
saCkptCheckpointNumSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointNumReplicasSA_UINT32_T  4 (0x4)
saCkptCheckpointNumReaders SA_UINT32_T 
4294967291 (0xfffb)
saCkptCheckpointNumOpeners SA_UINT32_T 
4294967291 (0xfffb)
saCkptCheckpointNumCorruptSections SA_UINT32_T  0 (0x0)
saCkptCheckpointMaxSectionsSA_UINT32_T  1 (0x1)
saCkptCheckpointMaxSectionSize SA_UINT64_T 2097152 
(0x20)
saCkptCheckpointMaxSectionIdSize   SA_UINT64_T 256 (0x100)
saCkptCheckpointCreationTimestamp  SA_TIME_T 
14685525530 (0x146158c4278eda00, Fri Jul 15 08:45:53 2016)
saCkptCheckpointCreationFlags  SA_UINT32_T  2 (0x2)
SaImmAttrImplementerName   SA_STRING_T 
safCheckPointService
SaImmAttrClassName SA_STRING_T 
SaCkptCheckpoint
SaImmAttrAdminOwnerNameSA_STRING_T 

-AVM


On 7/13/2016 12:44 PM, A V Mahesh wrote:
> Hi  Hoang /Nhat Pham,
>
> I just started testing , fowling test case is failing , I may report 
> more  as soon as I get some
>
> Test case 1 :
>
> Step 1 : saCkptCheckpointOpen  on SC-1
>
> SC-1:# ./node_A
> 0 saCkptCheckpointOpen  returned checkpointHandle 626bf0
> 1 saCkptCheckpointOpen  returned checkpointHandle 626e70
> 2 saCkptCheckpointOpen  returned checkpointHandle 626ff0
> 3 saCkptCheckpointOpen  returned checkpointHandle 627170
> 4 saCkptCheckpointOpen  returned checkpointHandle 6272f0 
> saCkptCheckpointWrite Waiting to Read from Checkpoint 
> saCkptCheckpointWrite Press  key to continue...
>
> 1 saCkptCheckpointWrite  checkpointHandle 626bf0
> 2 saCkptCheckpointWrite  checkpointHandle 626bf0
> 3 saCkptCheckpointWrite  checkpointHandle 626bf0
> 4 saCkptCheckpointWrite  checkpointHandle 626bf0
> 222 saCkptCheckpointWrite  checkpointHandle 626bf0 
> saCkptCheckpointRead Waiting to Read from Checkpoint 
> saCkptCheckpointRead Press  key to continue...
>
> Step 2 : saCkptCheckpointOpen  on SC-2
>
> SC-2:/avm/opensaf_app/cpsv_applications/virtualaddr # ./node_B
> 0 saCkptCheckpointOpen  returned checkpointHandle 626bf0
> 1 saCkptCheckpointOpen  returned checkpointHandle 626e70
> 2 saCkptCheckpointOpen  returned checkpointHandle 626ff0
> 3 saCkptCheckpointOpen  returned checkpointHandle 627170
> 4 saCkptCheckpointOpen  returned checkpointHandle 6272f0 
> saCkptCheckpointWrite Waiting to Read from Checkpoint 
> saCkptCheckpointWrite Press  key to continue...
>
> 1 saCkptCheckpointWrite  checkpointHandle 626bf0
> 2 saCkptCheckpointWrite  checkpointHandle 626bf0
> 3 saCkptCheckpointWrite  checkpointHandle 626bf0