Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-06 Thread Minh Hon Chau

Hi Canh,

Please see my comment marked with [Minh]

Thanks,

Minh


On 06/12/17 21:21, Canh Van Truong wrote:


Hi aMinh,

Please see my comment.

Regards

Canh

*From:*Minh Hon Chau [mailto:minh.c...@dektech.com.au]
*Sent:* Wednesday, December 6, 2017 7:20 AM
*To:* Canh Van Truong 
*Cc:* opensaf-devel@lists.sourceforge.net
*Subject:* Re: [PATCH 1/1] ntf: fix ntfd remove client in standby node 
while not finalize in active node [#2705]


Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
   TRACE_ENTER2("mdsDest: %ld", mdsDest);
   ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1
And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:
1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that will 
avoid the error in this ticket, but we are adding a client that had been down?
[Canh] Yes, the agent down will be add to list and no longer removed 
from list in this case. But the client belong to agents down always 
are removed from database(clientMap…) in right way. The down_list is 
just helpful in case failover. I will increase the priority of 
MDS_DOWN to VERY_HIGH before putting in to mailbox. And I keep the 
standby ntfd remove the client at checkpoint of client_down, not in 
process proc_ntfa_updn_mds_msg

How do you think about it?
[Minh] Since you introduce the down_list, so when client instance is 
removed, its item in down_list must not existed. Let's see the above 
sequence, if next step there will be a failover, the patch will remove 
the client (this is new client from next initialize() but same mds_dest) 
because it is present in down_list. The result is that the client will 
be removed unexpectedly.
If you increase the priority of MDS_DOWN before putting in mailbox, if 
it works, proc_ntfa_updn_mds_msg should be processed before checkpoint 
of finalize(). So then with just increasing the priority, the original 
problem would be solved without this patch?

Thanks,
Minh

On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting

NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower 
than next initialize request.

This cause the ntfd will removed all client from data base including new 
client of next initialze.

Any action relate to this client will fail.

The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in 
active node but does not remove

client in standby node. At standby node, ntfd will remove client when 
process the checkpoint of NCSMDS_DOWN event.

---

  src/ntf/ntfd/NtfAdmin.cc  | 89 
+++

  src/ntf/ntfd/NtfAdmin.h   | 11 --

  src/ntf/ntfd/ntfs_com.c   | 15 

  src/ntf/ntfd/ntfs_com.h   |  4 +++

  src/ntf/ntfd/ntfs_evt.c   | 17 +++--

  src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--

  6 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc

index dad00383d..aa616d7ee 100644

--- a/src/ntf/ntfd/NtfAdmin.cc

+++ b/src/ntf/ntfd/NtfAdmin.cc

@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {

  }

  


  /**

+ * Checking if the ntf agent with MDS_DEST is valid

+ *

+ * @param agent_dest

+ */

+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {

+  TRACE_ENTER();

+  ClientMap::iterator it;

+  bool valid = false;

+  for (it = clientMap.begin(); it != clientMap.end(); it++) {

+    NtfClient *client = it->second;

+    if (client->getMdsDest() == agent_dest) {

+  valid = true;

+  break;

+    }

+  }

+  TRACE_LEAVE2("The validation of ntfa: %d", valid);

+  return valid;

+}

+

+/**

+ * Add the ntfa down to the list. This is helpful to remember the

+ * list of ntfa to process in case failover.

+ *

+ * @param agent_dest

+ */

+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {

+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);

+

+  if (is_valid_ntf_agent(agent_dest)) {

+    MDS_DEST *mds_dest = new MDS_DEST;

+    *mds_dest = agent_dest;

+    ntfa_down_list.push_back(mds_dest);

+  }

+}

+

  

Re: [devel] [PATCH 1/1] mds: show extra tipc port id when install_tipc invoked [#2732]

2017-12-06 Thread Hans Nordebäck
Hi Vu,


ack, review and tested. Perhaps we should document what each field represents 
in the m_MDS_LOG_NOTIFY and remove the redundant server_inst?


/Thanks HansN


Från: Vu Minh Nguyen 
Skickat: den 6 december 2017 17:08:16
Till: Hans Nordebäck; Zoran Milinkovic
Kopia: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
Ämne: [PATCH 1/1] mds: show extra tipc port id when install_tipc invoked [#2732]

Add extra information - tipc port id toghether with service id when install tipc
invoked.  Such information will help find out which OpenSAF services causes tipc
trouble such as TIPC_ERR_OVERLOAD.
---
 src/mds/mds_dt_tipc.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index b3237a0..0466c6b 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -163,6 +163,22 @@ uint32_t mdtm_global_frag_num;

 const unsigned int MAX_RECV_THRESHOLD = 30;

+static bool get_tipc_port_id(int sock, uint32_t* port_id) {
+   struct sockaddr_tipc addr;
+   socklen_t sz = sizeof(addr);
+
+   memset(, 0, sizeof(addr));
+   *port_id = 0;
+   if (0 > getsockname(sock, (struct sockaddr *), )) {
+   syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: %s",
+  strerror(errno));
+   return false;
+   }
+
+   *port_id = addr.addr.id.ref;
+   return true;
+}
+
 /*

   Function NAME: mdtm_tipc_init
@@ -182,11 +198,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)

 NCS_PATRICIA_PARAMS pat_tree_params;

-   struct sockaddr_tipc addr;
-   socklen_t sz = sizeof(addr);
-
 memset(_cb, 0, sizeof(tipc_cb));
-   *mds_tipc_ref = 0;

 /* Added to assist the shutdown bug */
 mdtm_ref_hdl_list_hdr = NULL;
@@ -225,19 +237,14 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
 }

 /* Code for getting the self tipc random number */
-   memset(, 0, sizeof(addr));
-   if (0 > getsockname(tipc_cb.BSRsock, (struct sockaddr *), )) {
-   syslog(LOG_ERR,
-  "MDTM:TIPC Failed to get the BSR Sockname  err :%s",
-  strerror(errno));
+   if (!get_tipc_port_id(tipc_cb.BSRsock, mds_tipc_ref)) {
 close(tipc_cb.Dsock);
 close(tipc_cb.BSRsock);
 return NCSCC_RC_FAILURE;
 }
-   *mds_tipc_ref = addr.addr.id.ref;

 tipc_cb.adest = ((uint64_t)(nodeid)) << 32;
-   tipc_cb.adest |= addr.addr.id.ref;
+   tipc_cb.adest |= *mds_tipc_ref;
 tipc_cb.node_id = nodeid;
 get_adest_details(tipc_cb.adest, tipc_cb.adest_details);

@@ -1860,8 +1867,11 @@ uint32_t mds_mdtm_svc_install_tipc(PW_ENV_ID pwe_id, 
MDS_SVC_ID svc_id,
 server_addr.addr.nameseq.lower = server_inst;
 server_addr.addr.nameseq.upper = server_inst;

-   m_MDS_LOG_INFO("MDTM: install_tipc : <%u,%u,%u>", server_type,
-  server_inst, server_inst);
+   uint32_t port_id = 0;
+   get_tipc_port_id(tipc_cb.BSRsock, _id);
+   m_MDS_LOG_NOTIFY("MDTM: install_tipc : <%u,%u,%u,%u>",
+server_type, port_id,
+server_inst, server_inst);

 if (0 != bind(tipc_cb.BSRsock, (struct sockaddr *)_addr,
   sizeof(server_addr))) {
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-06 Thread Alex Jones
Summary: clmd: add dynamically created EEs to PLM entity group on standby 
[#2730]
Review request for Ticket(s): 2730
Peer Reviewer(s): Anders, Hans, Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2730
Base revision: 9ab54933456632260be87c2c763bd36b1ab7e5d2
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision be1f5e166884737b1786b4eab3a47f82c54e47f8
Author: Alex Jones 
Date:   Wed, 6 Dec 2017 11:58:33 -0500

clmd: add dynamically created EEs to PLM entity group on standby [#2730]

If EEs and corresponding CLM nodes are dynamically created, after a middleware
si-swap when the former standby has become active, then one of those EEs is
rebooted, clmd has not enabled PLM readiness state tracking on the EE and will
not know when it comes back. Thus, the node will not be allowed back into the
cluster because it thinks it is not a member.

The dynamically created EE is not being added to the PLM entity group on the
standby.

Add the dynamically created EE to the PLM entity group on the standby.



Complete diffstat:
--
 src/clm/clmd/clms_evt.c   |  2 +-
 src/clm/clmd/clms_imm.c   |  2 +-
 src/clm/clmd/clms_mbcsv.c | 20 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)


Testing Commands:
-
1) dynamically create a CLM node with an EE
2) middleware si-swap
3) reboot the EE node


Testing, Expected Results:
--
It should come back into the cluster


Conditions of Submission:
-
Dec 12, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

[devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-06 Thread Alex Jones
If EEs and corresponding CLM nodes are dynamically created, after a middleware
si-swap when the former standby has become active, then one of those EEs is
rebooted, clmd has not enabled PLM readiness state tracking on the EE and will
not know when it comes back. Thus, the node will not be allowed back into the
cluster because it thinks it is not a member.

The dynamically created EE is not being added to the PLM entity group on the
standby.

Add the dynamically created EE to the PLM entity group on the standby.
---
 src/clm/clmd/clms_evt.c   |  2 +-
 src/clm/clmd/clms_imm.c   |  2 +-
 src/clm/clmd/clms_mbcsv.c | 20 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/clm/clmd/clms_evt.c b/src/clm/clmd/clms_evt.c
index 4d7010d..b65036d 100644
--- a/src/clm/clmd/clms_evt.c
+++ b/src/clm/clmd/clms_evt.c
@@ -992,7 +992,7 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt)
if (delete_existing_nodedown_records(node_id) == true) {
TRACE_LEAVE();
return rc;
-   } else if (node->member == SA_FALSE) {
+   } else if (node->member == SA_FALSE && node->admin_state != 
SA_CLM_ADMIN_UNLOCKED) {
/* One possibility is that an admin operation has made
 * this a non-member */
TRACE_LEAVE();
diff --git a/src/clm/clmd/clms_imm.c b/src/clm/clmd/clms_imm.c
index 6809ce8..c245f67 100644
--- a/src/clm/clmd/clms_imm.c
+++ b/src/clm/clmd/clms_imm.c
@@ -2291,7 +2291,7 @@ SaAisErrorT clms_node_ccb_apply_cb(CcbUtilOperationData_t 
*opdata)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl,
entityNames, 1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d",
   rc);
return rc;
}
diff --git a/src/clm/clmd/clms_mbcsv.c b/src/clm/clmd/clms_mbcsv.c
index 47e4494..6976b03 100644
--- a/src/clm/clmd/clms_mbcsv.c
+++ b/src/clm/clmd/clms_mbcsv.c
@@ -282,6 +282,9 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
CLMSV_CKPT_NODE *param = >param.node_csync_rec;
CLMS_CLUSTER_NODE *node = NULL, *tmp_node = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
+#ifdef ENABLE_AIS_PLM
+   SaNameT *entityNames = NULL;
+#endif
 
TRACE_ENTER2("node_name:%s", param->node_name.value);
 
@@ -315,6 +318,21 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
LOG_ER("Patricia add failed");
}
}
+#ifdef ENABLE_AIS_PLM
+   /* Add it to the plm entity group */
+   entityNames = >ee_name;
+   if (clms_cb->reg_with_plm == SA_TRUE) {
+   SaAisErrorT aisrc = saPlmEntityGroupAdd(
+   clms_cb->ent_group_hdl,
+   entityNames,
+   1,
+   SA_PLM_GROUP_SINGLE_ENTITY);
+   if (aisrc != SA_AIS_OK) {
+   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   aisrc);
+   }
+   }
+#endif
}
TRACE_LEAVE();
return NCSCC_RC_SUCCESS;
@@ -357,7 +375,7 @@ static uint32_t ckpt_proc_node_del_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames,
1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", rc);
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc);
return rc;
}
}
-- 
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mds: show extra tipc port id when install_tipc invoked [#2732]

2017-12-06 Thread Vu Minh Nguyen
Summary: mds: show extra tipc port id when install_tipc invoked [#2732]
Review request for Ticket(s): 2732
Peer Reviewer(s): Hans, Zoran
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2732
Base revision: 9ab54933456632260be87c2c763bd36b1ab7e5d2
Personal repository: git://git.code.sf.net/u/winhvu/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 1b816354f4905e073038d9fb9b986d54b1d387f2
Author: Vu Minh Nguyen 
Date:   Wed, 6 Dec 2017 23:02:24 +0700

mds: show extra tipc port id when install_tipc invoked [#2732]

Add extra information - tipc port id toghether with service id when install tipc
invoked.  Such information will help find out which OpenSAF services causes tipc
trouble such as TIPC_ERR_OVERLOAD.



Complete diffstat:
--
 src/mds/mds_dt_tipc.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mds: show extra tipc port id when install_tipc invoked [#2732]

2017-12-06 Thread Vu Minh Nguyen
Add extra information - tipc port id toghether with service id when install tipc
invoked.  Such information will help find out which OpenSAF services causes tipc
trouble such as TIPC_ERR_OVERLOAD.
---
 src/mds/mds_dt_tipc.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index b3237a0..0466c6b 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -163,6 +163,22 @@ uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
 
+static bool get_tipc_port_id(int sock, uint32_t* port_id) {
+   struct sockaddr_tipc addr;
+   socklen_t sz = sizeof(addr);
+
+   memset(, 0, sizeof(addr));
+   *port_id = 0;
+   if (0 > getsockname(sock, (struct sockaddr *), )) {
+   syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: %s",
+  strerror(errno));
+   return false;
+   }
+
+   *port_id = addr.addr.id.ref;
+   return true;
+}
+
 /*
 
   Function NAME: mdtm_tipc_init
@@ -182,11 +198,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
 
NCS_PATRICIA_PARAMS pat_tree_params;
 
-   struct sockaddr_tipc addr;
-   socklen_t sz = sizeof(addr);
-
memset(_cb, 0, sizeof(tipc_cb));
-   *mds_tipc_ref = 0;
 
/* Added to assist the shutdown bug */
mdtm_ref_hdl_list_hdr = NULL;
@@ -225,19 +237,14 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
 
/* Code for getting the self tipc random number */
-   memset(, 0, sizeof(addr));
-   if (0 > getsockname(tipc_cb.BSRsock, (struct sockaddr *), )) {
-   syslog(LOG_ERR,
-  "MDTM:TIPC Failed to get the BSR Sockname  err :%s",
-  strerror(errno));
+   if (!get_tipc_port_id(tipc_cb.BSRsock, mds_tipc_ref)) {
close(tipc_cb.Dsock);
close(tipc_cb.BSRsock);
return NCSCC_RC_FAILURE;
}
-   *mds_tipc_ref = addr.addr.id.ref;
 
tipc_cb.adest = ((uint64_t)(nodeid)) << 32;
-   tipc_cb.adest |= addr.addr.id.ref;
+   tipc_cb.adest |= *mds_tipc_ref;
tipc_cb.node_id = nodeid;
get_adest_details(tipc_cb.adest, tipc_cb.adest_details);
 
@@ -1860,8 +1867,11 @@ uint32_t mds_mdtm_svc_install_tipc(PW_ENV_ID pwe_id, 
MDS_SVC_ID svc_id,
server_addr.addr.nameseq.lower = server_inst;
server_addr.addr.nameseq.upper = server_inst;
 
-   m_MDS_LOG_INFO("MDTM: install_tipc : <%u,%u,%u>", server_type,
-  server_inst, server_inst);
+   uint32_t port_id = 0;
+   get_tipc_port_id(tipc_cb.BSRsock, _id);
+   m_MDS_LOG_NOTIFY("MDTM: install_tipc : <%u,%u,%u,%u>",
+server_type, port_id,
+server_inst, server_inst);
 
if (0 != bind(tipc_cb.BSRsock, (struct sockaddr *)_addr,
  sizeof(server_addr))) {
-- 
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-06 Thread Canh Van Truong
Hi aMinh,

 

Please see my comment.

 

Regards

Canh

 

From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: Wednesday, December 6, 2017 7:20 AM
To: Canh Van Truong 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntf: fix ntfd remove client in standby node while not 
finalize in active node [#2705]

 

Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
  TRACE_ENTER2("mdsDest: %ld", mdsDest);
  ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1
 
And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:
 
1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that 
will avoid the error in this ticket, but we are adding a client that had been 
down?
 
[Canh] Yes, the agent down will be add to list and no longer removed from list 
in this case. But the client belong to agents down always are removed from 
database(clientMap…) in right way. The down_list is just helpful in case 
failover. I will increase the priority of MDS_DOWN to VERY_HIGH before putting 
in to mailbox. And I keep the standby ntfd remove the client at checkpoint of 
client_down, not in process proc_ntfa_updn_mds_msg
How do you think about it?
 
 
Thanks,
Minh 

 

On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting
NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower than 
next initialize request.
This cause the ntfd will removed all client from data base including new client 
of next initialze.
Any action relate to this client will fail.
 
The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in active 
node but does not remove
client in standby node. At standby node, ntfd will remove client when process 
the checkpoint of NCSMDS_DOWN event.
---
 src/ntf/ntfd/NtfAdmin.cc  | 89 +++
 src/ntf/ntfd/NtfAdmin.h   | 11 --
 src/ntf/ntfd/ntfs_com.c   | 15 
 src/ntf/ntfd/ntfs_com.h   |  4 +++
 src/ntf/ntfd/ntfs_evt.c   | 17 +++--
 src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--
 6 files changed, 169 insertions(+), 6 deletions(-)
 
diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc
index dad00383d..aa616d7ee 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {
 }
 
 /**
+ * Checking if the ntf agent with MDS_DEST is valid
+ *
+ * @param agent_dest
+ */
+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  ClientMap::iterator it;
+  bool valid = false;
+  for (it = clientMap.begin(); it != clientMap.end(); it++) {
+NtfClient *client = it->second;
+if (client->getMdsDest() == agent_dest) {
+  valid = true;
+  break;
+}
+  }
+  TRACE_LEAVE2("The validation of ntfa: %d", valid);
+  return valid;
+}
+
+/**
+ * Add the ntfa down to the list. This is helpful to remember the
+ * list of ntfa to process in case failover.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {
+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);
+
+  if (is_valid_ntf_agent(agent_dest)) {
+MDS_DEST *mds_dest = new MDS_DEST;
+*mds_dest = agent_dest;
+ntfa_down_list.push_back(mds_dest);
+  }
+}
+
+/**
+ * Remove the ntfa down from the list
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  std::list::iterator it;
+  for (it = ntfa_down_list.begin(); it != ntfa_down_list.end(); ++it) {
+MDS_DEST *mds_dest = *it;
+if (*mds_dest == agent_dest) {
+  ntfa_down_list.erase(it);
+  TRACE(" Remove ntfa down (%ld) from the list", agent_dest);
+  delete mds_dest;
+  return;
+}
+  }
+}
+
+/**
+ * Process to clear all ntfa down records in ntfd. The old client of
+ * agent down is removed from database.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::ProcessNtfAgentDownList() {
+  TRACE_ENTER();
+  std::list::iterator it = ntfa_down_list.begin();;
+  while(it != ntfa_down_list.end()) {
+MDS_DEST *mds_dest = *it;
+it = ntfa_down_list.erase(it);
+clientRemoveMDS(*mds_dest);
+delete mds_dest;
+  }
+  TRACE_LEAVE();
+  return;
+}
+
+/**
  * The node object where the client who had the subscription is notified
  * so it can