from:"minh chau"

[devel] [PATCH 1/1] amfd: Tightens sync window condition to proceed headless restoration [#3271]

2021-09-22 Thread Minh Chau

In roaming SC cluster, when both active/standby SC go down,
if SC Absence feature is enables, the cluster will nominate
another SC to become active. Amfnd in this nominated SC has
the led_state is true, therefore, amfd in this nominated SC
will proceed the restoration regardless the sync window.

This problem only happens in roaming SC with SC absence, it
does not happen in classical cluster with 2SC-xPLs.

This patch tightens the sync window condition to proceed
the restoration.
---
 src/amf/amfd/ndfsm.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 05c103e12..d6cf6bc57 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -351,11 +351,9 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
 goto done;
   }
   if (cb->node_sync_tmr.is_active == true) {
-if (n2d_msg->msg_info.n2d_node_up.leds_set == false) {
   TRACE("NodeSync timer is active, ignore this node_up msg 
(nodeid:%x)",
 n2d_msg->msg_info.n2d_node_up.node_id);
   goto done;
-}
   }
 }
   }
@@ -378,6 +376,10 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
   (cb->init_state < AVD_INIT_DONE)) {
 // node up from local AVND
 avd_process_state_info_queue(cb);
+// close nodesync window
+TRACE("stop NodeSync timer");
+avd_stop_tmr(cb, >node_sync_tmr);
+cb->node_sync_window_closed = true;
   }
 
   if (avnd->node_info.member != SA_TRUE) {
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: Tightens sync window condition to proceed headless restoration [#3271]

2021-09-22 Thread Minh Chau

Summary: amfd: Tightens sync window condition to proceed headless restoration 
[#3271]
Review request for Ticket(s): 3271
Peer Reviewer(s): Thang, Hieu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3271
Base revision: 876c9eb232f19de8ef8c5ddd1b3dcce4b1b4a8b3
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 34bcab026a5e2d2561c00bb358ad745624996a32
Author: Minh Chau 
Date:   Thu, 23 Sep 2021 09:11:08 +1000

amfd: Tightens sync window condition to proceed headless restoration [#3271]

In roaming SC cluster, when both active/standby SC go down,
if SC Absence feature is enables, the cluster will nominate
another SC to become active. Amfnd in this nominated SC has
the led_state is true, therefore, amfd in this nominated SC
will proceed the restoration regardless the sync window.

This problem only happens in roaming SC with SC absence, it
does not happen in classical cluster with 2SC-xPLs.

This patch tightens the sync window condition to proceed
the restoration.



Complete diffstat:
--
 src/amf/amfd/ndfsm.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for rde: Add timeout of waiting for peer info [#3263]

2021-06-30 Thread Minh Chau

Summary: rde: Add timeout of waiting for peer info [#3263]
Review request for Ticket(s): 3263
Peer Reviewer(s): Surbhi, Thang, Hieu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3263
Base revision: 854a8e03042d6a53a45b903262f5197a52a87525
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6b4028c2d5c405a80fcf7ab6297678ae99900b6e
Author: Minh Chau 
Date:   Mon, 28 Jun 2021 13:41:45 +1000

rde: Add timeout of waiting for peer info [#3263]

This ticket revisit the waiting for peer info and
fix the problem of disordered peer_up and peer info
in the commit d1593b03b3c9bec292b14dde65264c261760bf46



Complete diffstat:
--
 src/rde/rded/rde_main.cc |  1 +
 src/rde/rded/role.cc | 63 +++-
 src/rde/rded/role.h  |  7 ++
 3 files changed, 70 insertions(+), 1 deletion(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] rde: Add timeout of waiting for peer info [#3263]

2021-06-30 Thread Minh Chau

This ticket revisit the waiting for peer info and
fix the problem of disordered peer_up and peer info
in the commit d1593b03b3c9bec292b14dde65264c261760bf46
---
 src/rde/rded/rde_main.cc |  1 +
 src/rde/rded/role.cc | 63 +++-
 src/rde/rded/role.h  |  7 +
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 8ed6b046e..33dd645e2 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -125,6 +125,7 @@ static void handle_mbx_event() {
 }
 case RDE_MSG_PEER_DOWN:
   LOG_NO("Peer down on node 0x%x", msg->fr_node_id);
+  role->RemovePeer(msg->fr_node_id);
   break;
 case RDE_MSG_NEW_ACTIVE_CALLBACK: {
   const std::string my_node = base::Conf::NodeName();
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 3732be449..344702e63 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -196,9 +196,13 @@ Role::Role(NODE_ID own_node_id)
   discover_peer_timeout_{base::GetEnv("RDE_DISCOVER_PEER_TIMEOUT",
   kDefaultDiscoverPeerTimeout)},
   pre_active_script_timeout_{base::GetEnv(
-  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)} {}
+  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)},
+  received_peer_info_{true},
+  peer_info_wait_time_{},
+  peer_info_wait_timeout_ {kDefaultWaitPeerInfoTimeout} {}
 
 timespec* Role::Poll(timespec* ts) {
+  TRACE_ENTER();
   timespec* timeout = nullptr;
   if (role_ == PCS_RDA_UNDEFINED) {
 timespec now = base::ReadMonotonicClock();
@@ -238,6 +242,25 @@ timespec* Role::Poll(timespec* ts) {
 cb->state_refresh_thread_started = true;
 std::thread(::RefreshConsensusState, this, cb).detach();
   }
+  if (consensus_service.IsEnabled() == false) {
+// We are already ACTIVE, and has just discovered a new node
+// which makes the election_end_time_ reset
+if (received_peer_info_ == false) {
+  timespec now = base::ReadMonotonicClock();
+  if (peer_info_wait_time_ >= now) {
+*ts = peer_info_wait_time_ - now;
+timeout = ts;
+  } else {
+// Timeout but haven't received peer info
+// The peer RDE could be in ACTIVE
+// thus self-fence to avoid split-brain risk
+LOG_ER("Discovery peer up without peer info. Risk in split-brain,"
+"rebooting this node");
+opensaf_quick_reboot("Probable split-brain due to "
+"unknown RDE peer info");
+  }
+}
+  }
 }
   }
   return timeout;
@@ -251,12 +274,25 @@ void Role::ExecutePreActiveScript() {
 }
 
 void Role::AddPeer(NODE_ID node_id) {
+  TRACE_ENTER();
   auto result = known_nodes_.insert(node_id);
   if (result.second) {
 ResetElectionTimer();
+if (role_ == PCS_RDA_ACTIVE) {
+  ResetPeerInfoWaitTimer();
+  received_peer_info_ = false;
+}
   }
 }
 
+void Role::RemovePeer(NODE_ID node_id) {
+  TRACE_ENTER();
+  if (received_peer_info_ == false && role_ != PCS_RDA_ACTIVE) {
+StopPeerInfoWaitTimer();
+  }
+  known_nodes_.erase(node_id);
+}
+
 // call from main thread only
 bool Role::IsCandidate() {
   TRACE_ENTER();
@@ -330,10 +366,24 @@ uint32_t Role::SetRole(PCS_RDA_ROLE new_role) {
 }
 
 void Role::ResetElectionTimer() {
+  TRACE_ENTER();
   election_end_time_ = base::ReadMonotonicClock() +
base::MillisToTimespec(discover_peer_timeout_);
 }
 
+void Role::ResetPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  LOG_NO("Start/restart waiting peer info timer");
+  peer_info_wait_time_ = base::ReadMonotonicClock() +
+   base::MillisToTimespec(peer_info_wait_timeout_);
+}
+
+void Role::StopPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  // Turn off peer_info_timer
+  received_peer_info_ = true;
+}
+
 uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
  PCS_RDA_ROLE old_role) {
   uint32_t rc = NCSCC_RC_SUCCESS;
@@ -357,6 +407,7 @@ uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
 
 void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id,
 uint64_t peer_promote_pending) {
+  TRACE_ENTER();
   if (role() == PCS_RDA_UNDEFINED) {
 bool give_up = false;
 RDE_CONTROL_BLOCK *cb = rde_get_control_block();
@@ -372,6 +423,14 @@ void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID 
node_id,
 }
 if (node_role == PCS_RDA_ACTIVE || node_role == PCS_RDA_STANDBY ||
 give_up) {
+  // broadcast QUIESCED role to all peers to stop their waiting peer
+  // info timer
+  rde_msg peer_info_req;
+  peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
+  peer_info_req.info.peer_info.ha_role = PCS_RDA_QUIESCED;
+  peer_info_req.info.peer_info.promote_pending = 0;
+  rde_mds_broadcast(_info_req);
+

[devel] [PATCH 2/2] rde: Use broadcast for peer info message [#3263]

2021-05-25 Thread Minh Chau

RDE sends peer info message to whom it detects in peer up message.
In roaming SC, when all SCs rejoin from network split, all RDE now
are active. The duplicated active detection relies on peer info
message, which could be seen as one-on-one detection. The mechanism
may cause the last SC not detected if all other SCs are detected as
duplicated active and reboot.

The patch changes to use broadcast peer info message to increase
the possibility of receiving peer info message from all other SCs
---
 src/rde/rded/rde_cb.h|  2 +-
 src/rde/rded/rde_main.cc | 22 --
 src/rde/rded/rde_mds.cc  | 20 +++-
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index 50a0a0d26..b744b7c72 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -101,7 +101,7 @@ extern uint32_t rde_mds_register();
 extern uint32_t rde_discovery_mds_register();
 extern uint32_t rde_mds_unregister();
 extern uint32_t rde_discovery_mds_unregister();
-extern uint32_t rde_mds_send(rde_msg *msg, MDS_DEST to_dest);
+extern uint32_t rde_mds_broadcast(rde_msg *msg);
 extern uint32_t rde_set_role(PCS_RDA_ROLE role);
 
 #endif  // RDE_RDED_RDE_CB_H_
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index e6bd759ec..8ed6b046e 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -46,7 +46,7 @@
 enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER,
FD_SIGHUP, FD_CLIENT_START };
 
-static void SendPeerInfoResp(MDS_DEST mds_dest);
+static void BroadcastPeerInfoResp();
 static void CheckForSplitBrain(const rde_msg *msg);
 
 const char *rde_msg_name[] = {"-",
@@ -105,18 +105,20 @@ static void handle_mbx_event() {
   switch (msg->type) {
 case RDE_MSG_PEER_INFO_REQ:
 case RDE_MSG_PEER_INFO_RESP: {
-  LOG_NO("Got peer info %s from node 0x%x with role %s",
- msg->type == RDE_MSG_PEER_INFO_RESP ? "response" : "request",
- msg->fr_node_id, Role::to_string(msg->info.peer_info.ha_role));
-  CheckForSplitBrain(msg);
-  role->SetPeerState(msg->info.peer_info.ha_role, msg->fr_node_id,
- msg->info.peer_info.promote_pending);
+  if (msg->fr_node_id != own_node_id) {
+LOG_NO("Got peer info %s from node 0x%x with role %s",
+msg->type == RDE_MSG_PEER_INFO_RESP ? "response" : "request",
+msg->fr_node_id, Role::to_string(msg->info.peer_info.ha_role));
+CheckForSplitBrain(msg);
+role->SetPeerState(msg->info.peer_info.ha_role, msg->fr_node_id,
+msg->info.peer_info.promote_pending);
+  }
   break;
 }
 case RDE_MSG_PEER_UP: {
   if (msg->fr_node_id != own_node_id) {
 LOG_NO("Peer up on node 0x%x", msg->fr_node_id);
-SendPeerInfoResp(msg->fr_dest);
+BroadcastPeerInfoResp();
 role->AddPeer(msg->fr_node_id);
   }
   break;
@@ -284,7 +286,7 @@ static void CheckForSplitBrain(const rde_msg *msg) {
   }
 }
 
-static void SendPeerInfoResp(MDS_DEST mds_dest) {
+static void BroadcastPeerInfoResp() {
   RDE_CONTROL_BLOCK *cb = rde_get_control_block();
   rde_msg peer_info_req;
   peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
@@ -294,7 +296,7 @@ static void SendPeerInfoResp(MDS_DEST mds_dest) {
 cb->promote_pending = base::TimespecToMillis(now - cb->promote_start);
   }
   peer_info_req.info.peer_info.promote_pending = cb->promote_pending;
-  rde_mds_send(_info_req, mds_dest);
+  rde_mds_broadcast(_info_req);
 }
 
 /**
diff --git a/src/rde/rded/rde_mds.cc b/src/rde/rded/rde_mds.cc
index a32f54082..4591d1996 100644
--- a/src/rde/rded/rde_mds.cc
+++ b/src/rde/rded/rde_mds.cc
@@ -209,6 +209,8 @@ static uint32_t mds_callback(struct ncsmds_callback_info 
*info) {
   msg = (struct rde_msg *)info->info.receive.i_msg;
   msg->fr_dest = info->info.receive.i_fr_dest;
   msg->fr_node_id = info->info.receive.i_node_id;
+  TRACE("MDS RECEIVE dest: %" PRIx64 ", node ID: %x, msg_type: %d",
+  msg->fr_dest, msg->fr_node_id, msg->type);
   if (ncs_ipc_send(>mbx, reinterpret_cast(
  info->info.receive.i_msg),
NCS_IPC_PRIORITY_NORMAL) != NCSCC_RC_SUCCESS) {
@@ -385,11 +387,11 @@ uint32_t rde_discovery_mds_unregister() {
   return rc;
 }
 
-uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST to_dest) {
+uint32_t rde_mds_broadcast(struct rde_msg *msg) {
   NCSMDS_INFO info;
   uint32_t rc;
 
-  TRACE("Sending %s to %" PRIx64, rde_msg_name[msg->type], to_dest);
+  TRACE("Sending %s to all rded instances", rde_msg_name[msg->type]);
   memset(, 0, sizeof(info));
 
   info.i_mds_hdl = mds_hdl;
@@ -397,21 +399,21 @@ uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST 
to_dest) {
   info.i_svc_id = NCSMDS_SVC_ID_RDE;
 
   info.info.svc_send.i_msg = msg;
-  info.info.svc_send.i_priority = MDS_SEND_PRIORITY_MEDIUM;
-  info.info.svc_send.i_sendtype = MDS_SENDTYPE_SND;
+

[devel] [PATCH 1/2] rde: Add timeout waiting for peer info [#3263]

2021-05-25 Thread Minh Chau

RDE detects the peer_up message and suppose the peer_info message
will come afterwards. However, in roaming SC, when all SCs rejoins
from network split, the last active SC may be missing out the peer
info message since the others SC have already reboot.

Patch adds timeout to wait for peer info message to avoid a risk
of missing peer info message to detect duplicated active SC. The
new timeout is used for all peers, meaning that the timeout reset
for each peer up message and wait for the last peer info message.
---
 src/rde/rded/role.cc | 46 +++-
 src/rde/rded/role.h  |  6 ++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 3732be449..464813482 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -196,9 +196,13 @@ Role::Role(NODE_ID own_node_id)
   discover_peer_timeout_{base::GetEnv("RDE_DISCOVER_PEER_TIMEOUT",
   kDefaultDiscoverPeerTimeout)},
   pre_active_script_timeout_{base::GetEnv(
-  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)} {}
+  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)},
+  received_peer_info_{true},
+  peer_info_wait_time_{},
+  peer_info_wait_timeout_ {kDefaultWaitPeerInfoTimeout} {}
 
 timespec* Role::Poll(timespec* ts) {
+  TRACE_ENTER();
   timespec* timeout = nullptr;
   if (role_ == PCS_RDA_UNDEFINED) {
 timespec now = base::ReadMonotonicClock();
@@ -238,6 +242,25 @@ timespec* Role::Poll(timespec* ts) {
 cb->state_refresh_thread_started = true;
 std::thread(::RefreshConsensusState, this, cb).detach();
   }
+  if (consensus_service.IsEnabled() == false) {
+// We are already ACTIVE, and has just discovered a new node
+// which makes the election_end_time_ reset
+if (received_peer_info_ == false) {
+  timespec now = base::ReadMonotonicClock();
+  if (peer_info_wait_time_ >= now) {
+*ts = peer_info_wait_time_ - now;
+timeout = ts;
+  } else {
+// Timeout but haven't received peer info
+// The peer RDE could be in ACTIVE
+// thus self-fence to avoid split-brain risk
+LOG_ER("Discovery peer up without peer info. Risk in split-brain,"
+"rebooting this node");
+opensaf_quick_reboot("Probable split-brain due to "
+"unknown RDE peer info");
+  }
+}
+  }
 }
   }
   return timeout;
@@ -251,9 +274,14 @@ void Role::ExecutePreActiveScript() {
 }
 
 void Role::AddPeer(NODE_ID node_id) {
+  TRACE_ENTER();
   auto result = known_nodes_.insert(node_id);
   if (result.second) {
 ResetElectionTimer();
+if (role_ == PCS_RDA_ACTIVE) {
+  ResetPeerInfoWaitTimer();
+  received_peer_info_ = false;
+}
   }
 }
 
@@ -330,10 +358,24 @@ uint32_t Role::SetRole(PCS_RDA_ROLE new_role) {
 }
 
 void Role::ResetElectionTimer() {
+  TRACE_ENTER();
   election_end_time_ = base::ReadMonotonicClock() +
base::MillisToTimespec(discover_peer_timeout_);
 }
 
+void Role::ResetPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  // Reuse peer discovery timeout
+  peer_info_wait_time_ = base::ReadMonotonicClock() +
+   base::MillisToTimespec(peer_info_wait_timeout_);
+}
+
+void Role::StopPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  // Turn off peer_info_timer
+  received_peer_info_ = true;
+}
+
 uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
  PCS_RDA_ROLE old_role) {
   uint32_t rc = NCSCC_RC_SUCCESS;
@@ -357,6 +399,7 @@ uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
 
 void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id,
 uint64_t peer_promote_pending) {
+  TRACE_ENTER();
   if (role() == PCS_RDA_UNDEFINED) {
 bool give_up = false;
 RDE_CONTROL_BLOCK *cb = rde_get_control_block();
@@ -379,6 +422,7 @@ void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID 
node_id,
  node_id, to_string(node_role), to_string(role()));
 }
   }
+  StopPeerInfoWaitTimer();
 }
 
 void Role::PromoteNodeLate() {
diff --git a/src/rde/rded/role.h b/src/rde/rded/role.h
index 2d24361c5..1fb84d1a8 100644
--- a/src/rde/rded/role.h
+++ b/src/rde/rded/role.h
@@ -53,9 +53,12 @@ class Role {
 
  private:
   static const uint64_t kDefaultDiscoverPeerTimeout = 2000;
+  static const uint64_t kDefaultWaitPeerInfoTimeout = 5000;
   static const uint64_t kDefaultPreActiveScriptTimeout = 5000;
   void ExecutePreActiveScript();
   void ResetElectionTimer();
+  void ResetPeerInfoWaitTimer();
+  void StopPeerInfoWaitTimer();
   uint32_t UpdateMdsRegistration(PCS_RDA_ROLE new_role, PCS_RDA_ROLE old_role);
   void PromoteNode(const uint64_t cluster_size, const bool relaxed_mode);
 
@@ -68,6 +71,9 @@ class Role {
   uint64_t pre_active_script_timeout_;

[devel] [PATCH 0/2] Review Request for rde: Fix problem of all active SCs rejoin from network split [#3263] V2

2021-05-25 Thread Minh Chau

Summary: rde: Fix problem of all active SCs rejoin from network split [#3263] V2
Review request for Ticket(s): 3263
Peer Reviewer(s): Surbhi, Thang, Hieu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3263
Base revision: f938c0c375bbd77c4343d4bf3bed57abd45b58aa
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 61a6bc81c530aee4fdd4dcf0425fb6f55e39d505
Author: Minh Chau 
Date:   Tue, 25 May 2021 17:40:28 +1000

rde: Use broadcast for peer info message [#3263]

RDE sends peer info message to whom it detects in peer up message.
In roaming SC, when all SCs rejoin from network split, all RDE now
are active. The duplicated active detection relies on peer info
message, which could be seen as one-on-one detection. The mechanism
may cause the last SC not detected if all other SCs are detected as
duplicated active and reboot.

The patch changes to use broadcast peer info message to increase
the possibility of receiving peer info message from all other SCs



revision e1aeef67ca87d091c0da9994cbf074015801139b
Author: Minh Chau 
Date:   Tue, 25 May 2021 17:40:09 +1000

rde: Add timeout waiting for peer info [#3263]

RDE detects the peer_up message and suppose the peer_info message
will come afterwards. However, in roaming SC, when all SCs rejoins
from network split, the last active SC may be missing out the peer
info message since the others SC have already reboot.

Patch adds timeout to wait for peer info message to avoid a risk
of missing peer info message to detect duplicated active SC. The
new timeout is used for all peers, meaning that the timeout reset
for each peer up message and wait for the last peer info message.



Complete diffstat:
--
 src/rde/rded/rde_cb.h|  2 +-
 src/rde/rded/rde_main.cc | 22 --
 src/rde/rded/rde_mds.cc  | 20 +++-
 src/rde/rded/role.cc | 46 +-
 src/rde/rded/role.h  |  6 ++
 5 files changed, 75 insertions(+), 21 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You h

[devel] [PATCH 2/2] rde: Use broadcast for peer info message [#3263]

2021-05-24 Thread Minh Chau

RDE sends peer info message to whom it detects in peer up message.
In roaming SC, when all SCs rejoin from network split, all RDE now
are active. The duplicated active detection relies on peer info
message, which could be seen as one-on-one detection. The mechanism
may cause the last SC not detected if all other SCs are detected as
duplicated active and reboot.

The patch changes to use broadcast peer info message to increase
the possibility of receiving peer info message from all other SCs
---
 src/rde/rded/rde_cb.h|  2 +-
 src/rde/rded/rde_main.cc | 22 --
 src/rde/rded/rde_mds.cc  | 20 +++-
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index 50a0a0d26..b744b7c72 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -101,7 +101,7 @@ extern uint32_t rde_mds_register();
 extern uint32_t rde_discovery_mds_register();
 extern uint32_t rde_mds_unregister();
 extern uint32_t rde_discovery_mds_unregister();
-extern uint32_t rde_mds_send(rde_msg *msg, MDS_DEST to_dest);
+extern uint32_t rde_mds_broadcast(rde_msg *msg);
 extern uint32_t rde_set_role(PCS_RDA_ROLE role);
 
 #endif  // RDE_RDED_RDE_CB_H_
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index e6bd759ec..8ed6b046e 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -46,7 +46,7 @@
 enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER,
FD_SIGHUP, FD_CLIENT_START };
 
-static void SendPeerInfoResp(MDS_DEST mds_dest);
+static void BroadcastPeerInfoResp();
 static void CheckForSplitBrain(const rde_msg *msg);
 
 const char *rde_msg_name[] = {"-",
@@ -105,18 +105,20 @@ static void handle_mbx_event() {
   switch (msg->type) {
 case RDE_MSG_PEER_INFO_REQ:
 case RDE_MSG_PEER_INFO_RESP: {
-  LOG_NO("Got peer info %s from node 0x%x with role %s",
- msg->type == RDE_MSG_PEER_INFO_RESP ? "response" : "request",
- msg->fr_node_id, Role::to_string(msg->info.peer_info.ha_role));
-  CheckForSplitBrain(msg);
-  role->SetPeerState(msg->info.peer_info.ha_role, msg->fr_node_id,
- msg->info.peer_info.promote_pending);
+  if (msg->fr_node_id != own_node_id) {
+LOG_NO("Got peer info %s from node 0x%x with role %s",
+msg->type == RDE_MSG_PEER_INFO_RESP ? "response" : "request",
+msg->fr_node_id, Role::to_string(msg->info.peer_info.ha_role));
+CheckForSplitBrain(msg);
+role->SetPeerState(msg->info.peer_info.ha_role, msg->fr_node_id,
+msg->info.peer_info.promote_pending);
+  }
   break;
 }
 case RDE_MSG_PEER_UP: {
   if (msg->fr_node_id != own_node_id) {
 LOG_NO("Peer up on node 0x%x", msg->fr_node_id);
-SendPeerInfoResp(msg->fr_dest);
+BroadcastPeerInfoResp();
 role->AddPeer(msg->fr_node_id);
   }
   break;
@@ -284,7 +286,7 @@ static void CheckForSplitBrain(const rde_msg *msg) {
   }
 }
 
-static void SendPeerInfoResp(MDS_DEST mds_dest) {
+static void BroadcastPeerInfoResp() {
   RDE_CONTROL_BLOCK *cb = rde_get_control_block();
   rde_msg peer_info_req;
   peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
@@ -294,7 +296,7 @@ static void SendPeerInfoResp(MDS_DEST mds_dest) {
 cb->promote_pending = base::TimespecToMillis(now - cb->promote_start);
   }
   peer_info_req.info.peer_info.promote_pending = cb->promote_pending;
-  rde_mds_send(_info_req, mds_dest);
+  rde_mds_broadcast(_info_req);
 }
 
 /**
diff --git a/src/rde/rded/rde_mds.cc b/src/rde/rded/rde_mds.cc
index a32f54082..4591d1996 100644
--- a/src/rde/rded/rde_mds.cc
+++ b/src/rde/rded/rde_mds.cc
@@ -209,6 +209,8 @@ static uint32_t mds_callback(struct ncsmds_callback_info 
*info) {
   msg = (struct rde_msg *)info->info.receive.i_msg;
   msg->fr_dest = info->info.receive.i_fr_dest;
   msg->fr_node_id = info->info.receive.i_node_id;
+  TRACE("MDS RECEIVE dest: %" PRIx64 ", node ID: %x, msg_type: %d",
+  msg->fr_dest, msg->fr_node_id, msg->type);
   if (ncs_ipc_send(>mbx, reinterpret_cast(
  info->info.receive.i_msg),
NCS_IPC_PRIORITY_NORMAL) != NCSCC_RC_SUCCESS) {
@@ -385,11 +387,11 @@ uint32_t rde_discovery_mds_unregister() {
   return rc;
 }
 
-uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST to_dest) {
+uint32_t rde_mds_broadcast(struct rde_msg *msg) {
   NCSMDS_INFO info;
   uint32_t rc;
 
-  TRACE("Sending %s to %" PRIx64, rde_msg_name[msg->type], to_dest);
+  TRACE("Sending %s to all rded instances", rde_msg_name[msg->type]);
   memset(, 0, sizeof(info));
 
   info.i_mds_hdl = mds_hdl;
@@ -397,21 +399,21 @@ uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST 
to_dest) {
   info.i_svc_id = NCSMDS_SVC_ID_RDE;
 
   info.info.svc_send.i_msg = msg;
-  info.info.svc_send.i_priority = MDS_SEND_PRIORITY_MEDIUM;
-  info.info.svc_send.i_sendtype = MDS_SENDTYPE_SND;
+

[devel] [PATCH 1/2] rde: Add timeout waiting for peer info [#3263]

2021-05-24 Thread Minh Chau

RDE detects the peer_up message and suppose the peer_info message
will come afterwards. However, in roaming SC, when all SCs rejoins
from network split, the last active SC may be missing out the peer
info message since the others SC have already reboot.

Patch adds timeout to wait for peer info message to avoid a risk
of missing peer info message to detect duplicated active SC. The
new timeout is used for all peers, meaning that the timeout reset
for each peer up message and wait for the last peer info message.
---
 src/rde/rded/role.cc | 45 +++-
 src/rde/rded/role.h  |  6 ++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 3732be449..8ec253b99 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -196,9 +196,13 @@ Role::Role(NODE_ID own_node_id)
   discover_peer_timeout_{base::GetEnv("RDE_DISCOVER_PEER_TIMEOUT",
   kDefaultDiscoverPeerTimeout)},
   pre_active_script_timeout_{base::GetEnv(
-  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)} {}
+  "RDE_PRE_ACTIVE_SCRIPT_TIMEOUT", kDefaultPreActiveScriptTimeout)},
+  received_peer_info_{true},
+  peer_info_wait_time_{},
+  peer_info_wait_timeout_ {kDefaultWaitPeerInfoTimeout} {}
 
 timespec* Role::Poll(timespec* ts) {
+  TRACE_ENTER();
   timespec* timeout = nullptr;
   if (role_ == PCS_RDA_UNDEFINED) {
 timespec now = base::ReadMonotonicClock();
@@ -237,6 +241,24 @@ timespec* Role::Poll(timespec* ts) {
 cb->state_refresh_thread_started == false) {
 cb->state_refresh_thread_started = true;
 std::thread(::RefreshConsensusState, this, cb).detach();
+  } else {
+// We are already ACTIVE, and has just discovered a new node
+// which makes the election_end_time_ reset
+if (received_peer_info_ == false) {
+  timespec now = base::ReadMonotonicClock();
+  if (peer_info_wait_time_ >= now) {
+*ts = peer_info_wait_time_ - now;
+timeout = ts;
+  } else {
+// Timeout but haven't received peer info
+// The peer RDE could be in ACTIVE
+// thus self-fence to avoid split-brain risk
+LOG_ER("Discovery peer up without peer info. Risk in split-brain,"
+"rebooting this node");
+opensaf_quick_reboot("Probable split-brain due to "
+"unknown RDE peer info");
+  }
+}
   }
 }
   }
@@ -251,9 +273,14 @@ void Role::ExecutePreActiveScript() {
 }
 
 void Role::AddPeer(NODE_ID node_id) {
+  TRACE_ENTER();
   auto result = known_nodes_.insert(node_id);
   if (result.second) {
 ResetElectionTimer();
+if (role_ == PCS_RDA_ACTIVE) {
+  ResetPeerInfoWaitTimer();
+  received_peer_info_ = false;
+}
   }
 }
 
@@ -330,10 +357,24 @@ uint32_t Role::SetRole(PCS_RDA_ROLE new_role) {
 }
 
 void Role::ResetElectionTimer() {
+  TRACE_ENTER();
   election_end_time_ = base::ReadMonotonicClock() +
base::MillisToTimespec(discover_peer_timeout_);
 }
 
+void Role::ResetPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  // Reuse peer discovery timeout
+  peer_info_wait_time_ = base::ReadMonotonicClock() +
+   base::MillisToTimespec(peer_info_wait_timeout_);
+}
+
+void Role::StopPeerInfoWaitTimer() {
+  TRACE_ENTER();
+  // Turn off peer_info_timer
+  received_peer_info_ = true;
+}
+
 uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
  PCS_RDA_ROLE old_role) {
   uint32_t rc = NCSCC_RC_SUCCESS;
@@ -357,6 +398,7 @@ uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
 
 void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id,
 uint64_t peer_promote_pending) {
+  TRACE_ENTER();
   if (role() == PCS_RDA_UNDEFINED) {
 bool give_up = false;
 RDE_CONTROL_BLOCK *cb = rde_get_control_block();
@@ -379,6 +421,7 @@ void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID 
node_id,
  node_id, to_string(node_role), to_string(role()));
 }
   }
+  StopPeerInfoWaitTimer();
 }
 
 void Role::PromoteNodeLate() {
diff --git a/src/rde/rded/role.h b/src/rde/rded/role.h
index 2d24361c5..1fb84d1a8 100644
--- a/src/rde/rded/role.h
+++ b/src/rde/rded/role.h
@@ -53,9 +53,12 @@ class Role {
 
  private:
   static const uint64_t kDefaultDiscoverPeerTimeout = 2000;
+  static const uint64_t kDefaultWaitPeerInfoTimeout = 5000;
   static const uint64_t kDefaultPreActiveScriptTimeout = 5000;
   void ExecutePreActiveScript();
   void ResetElectionTimer();
+  void ResetPeerInfoWaitTimer();
+  void StopPeerInfoWaitTimer();
   uint32_t UpdateMdsRegistration(PCS_RDA_ROLE new_role, PCS_RDA_ROLE old_role);
   void PromoteNode(const uint64_t cluster_size, const bool relaxed_mode);
 
@@ -68,6 +71,9 @@ class Role {
   uint64_t pre_active_script_timeout_;
   static const

[devel] [PATCH 0/2] Review Request for rde: Fix problem of all active SCs rejoin from network split [#3263]

2021-05-24 Thread Minh Chau

Summary: rde: Fix problem of all active SCs rejoin from network split [#3263]
Review request for Ticket(s): 3263
Peer Reviewer(s): Surbhi, Hieu, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3263
Base revision: f938c0c375bbd77c4343d4bf3bed57abd45b58aa
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 1d44beebd7228191e007b2159ebc97c8e26638a8
Author: Minh Chau 
Date:   Mon, 24 May 2021 16:57:27 +1000

rde: Use broadcast for peer info message [#3263]

RDE sends peer info message to whom it detects in peer up message.
In roaming SC, when all SCs rejoin from network split, all RDE now
are active. The duplicated active detection relies on peer info
message, which could be seen as one-on-one detection. The mechanism
may cause the last SC not detected if all other SCs are detected as
duplicated active and reboot.

The patch changes to use broadcast peer info message to increase
the possibility of receiving peer info message from all other SCs



revision f84d1d54f1c0889e88af55c9ace14d05d52aa134
Author: Minh Chau 
Date:   Mon, 24 May 2021 16:57:27 +1000

rde: Add timeout waiting for peer info [#3263]

RDE detects the peer_up message and suppose the peer_info message
will come afterwards. However, in roaming SC, when all SCs rejoins
from network split, the last active SC may be missing out the peer
info message since the others SC have already reboot.

Patch adds timeout to wait for peer info message to avoid a risk
of missing peer info message to detect duplicated active SC. The
new timeout is used for all peers, meaning that the timeout reset
for each peer up message and wait for the last peer info message.



Complete diffstat:
--
 src/rde/rded/rde_cb.h|  2 +-
 src/rde/rded/rde_main.cc | 22 --
 src/rde/rded/rde_mds.cc  | 20 +++-
 src/rde/rded/role.cc | 45 -
 src/rde/rded/role.h  |  6 ++
 5 files changed, 74 insertions(+), 21 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You h

[devel] [PATCH 2/2] imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]

2021-05-17 Thread Minh Chau

---
 src/imm/agent/imma_oi_api.cc | 2 +-
 src/imm/agent/imma_om_api.cc | 2 +-
 src/imm/agent/imma_proc.cc   | 5 +++--
 src/imm/immd/immd_cb.h   | 1 -
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/imm/agent/imma_oi_api.cc b/src/imm/agent/imma_oi_api.cc
index 8e9a1f2b0..c79d0e315 100644
--- a/src/imm/agent/imma_oi_api.cc
+++ b/src/imm/agent/imma_oi_api.cc
@@ -3510,7 +3510,7 @@ int imma_oi_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked,
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmOiHandleT immOiHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
   *locked = false;
diff --git a/src/imm/agent/imma_om_api.cc b/src/imm/agent/imma_om_api.cc
index 55fdcc2d8..f34e7645c 100644
--- a/src/imm/agent/imma_om_api.cc
+++ b/src/imm/agent/imma_om_api.cc
@@ -9761,7 +9761,7 @@ int imma_om_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked) {
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmHandleT immHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
   SaAisErrorT err_resurrect = SA_AIS_OK;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
diff --git a/src/imm/agent/imma_proc.cc b/src/imm/agent/imma_proc.cc
index ea035827f..723550904 100644
--- a/src/imm/agent/imma_proc.cc
+++ b/src/imm/agent/imma_proc.cc
@@ -1570,6 +1570,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
   unsigned int sleep_delay_ms = 500;
   unsigned int max_waiting_time_ms = 2 * 1000; /* 2 secs */
   unsigned int msecs_waited = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   if (m_NCS_LOCK(>cb_lock, NCS_LOCK_WRITE) != NCSCC_RC_SUCCESS) {
 TRACE_3("Lock failure");
@@ -1583,7 +1584,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 cl_node, cl_node ? cl_node->exposed : 0);
 goto failure;
   }
-
+  timeout = cl_node->syncr_timeout;
   if (!cl_node->stale) {
 TRACE_3(
 "imma_proc_resurrect_client: Handle %llx was not stale, "
@@ -1623,7 +1624,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 /* send the request to the IMMND */
 if (imma_mds_msg_sync_send(cb->imma_mds_hdl, &(cb->immnd_mds_dest),
_evt, _evt,
-   IMMSV_WAIT_TIME) != NCSCC_RC_SUCCESS) {
+   timeout) != NCSCC_RC_SUCCESS) {
   TRACE_3("Failure in MDS send");
   goto exposed;
 }
diff --git a/src/imm/immd/immd_cb.h b/src/imm/immd/immd_cb.h
index e9710ff71..51f52aba8 100644
--- a/src/imm/immd/immd_cb.h
+++ b/src/imm/immd/immd_cb.h
@@ -23,7 +23,6 @@
 #include 
 
 #define IMMD_EVT_TIME_OUT 100
-#define IMMSV_WAIT_TIME 100
 
 #define m_IMMND_IS_ON_SCXB(m, n) ((m == n) ? 1 : 0)
 #define m_IMMD_IS_LOCAL_NODE(m, n) (m == n) ? 1 : 0
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/2] immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

2021-05-17 Thread Minh Chau

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise
---
 src/imm/common/immsv_api.h |  4 
 src/imm/immloadd/imm_loader.cc |  2 +-
 src/imm/immnd/ImmModel.cc  |  2 +-
 src/imm/immnd/immnd.conf   |  4 
 src/imm/immnd/immnd_cb.h   |  1 +
 src/imm/immnd/immnd_evt.c  | 41 --
 src/imm/immnd/immnd_main.c | 15 +
 7 files changed, 41 insertions(+), 28 deletions(-)

diff --git a/src/imm/common/immsv_api.h b/src/imm/common/immsv_api.h
index c37a624ed..f809c8168 100644
--- a/src/imm/common/immsv_api.h
+++ b/src/imm/common/immsv_api.h
@@ -77,10 +77,6 @@ typedef enum {
   ACCESS_CONTROL_ENFORCING = 2
 } OsafImmAccessControlModeT;
 
-/*Max # of outstanding fevs messages towards director.*/
-/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
-
 #define IMMSV_MAX_OBJECTS 1
 #define IMMSV_MAX_ATTRIBUTES 128
 #define IMMSV_MAX_ADMO_NAME_LENGTH 256
diff --git a/src/imm/immloadd/imm_loader.cc b/src/imm/immloadd/imm_loader.cc
index 516fb24ec..fadb0da51 100644
--- a/src/imm/immloadd/imm_loader.cc
+++ b/src/imm/immloadd/imm_loader.cc
@@ -2318,7 +2318,7 @@ int syncObjectsOfClass(std::string className, 
SaImmHandleT ,
 do {
   if (retries) {
 /* TRY_AGAIN while sync is in progress means *this* IMMND most likely
-   has reached IMMSV_DEFAULT_FEVS_MAX_PENDING. This means that *this*
+   has reached max pending fevs messages. This means that *this*
IMMND has sent its quota of fevs messages to IMMD without having
received them back via broadcast from IMMD.
 
diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index 2d750040e..8631dc21f 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -1614,7 +1614,7 @@ SaAisErrorT immModel_nextResult(IMMND_CB* cb, void* 
searchOp,
   TRACE_2(
   "ERR_TRY_AGAIN: Too many pending incoming fevs "
   "messages (> %u) rejecting sync iteration next request",
-  IMMSV_DEFAULT_FEVS_MAX_PENDING);
+  cb->mFevsMaxPending);
   return SA_AIS_ERR_TRY_AGAIN;
 }
 err = ImmModel::instance(>immModel)->nextSyncResult(rsp, *op);
diff --git a/src/imm/immnd/immnd.conf b/src/imm/immnd/immnd.conf
index e62367654..f9a809e16 100644
--- a/src/imm/immnd/immnd.conf
+++ b/src/imm/immnd/immnd.conf
@@ -95,3 +95,7 @@ export IMMSV_ENV_HEALTHCHECK_KEY="Default"
 #objects, objects_blob_multi,
 #objects_int_multi, objects_real_multi,
 #objects_text_multi, pbe_rep_version";
+
+# Max outstanding fevs messages towards director without having received
+# them back via director's broadcast message. Default value is 16.
+# export IMMSV_FEVS_MAX_PENDING=64
diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index bb3bb8493..bc2d75b05 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -209,6 +209,7 @@ typedef struct immnd_cb_tag {
   NCS_PATRICIA_TREE immnd_clm_list; /* IMMND_IMM_CLIENT_NODE - node */
   tmr_t splitbrain_tmr;
   bool splitbrain_tmr_run;
+  uint8_t mFevsMaxPending; /* Max pending fevs messages towards director */
 } IMMND_CB;
 
 /* CB prototypes */
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 0e801a942..64c5223c4 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -1787,7 +1787,7 @@ static uint32_t immnd_evt_proc_search_next(IMMND_CB *cb, 
IMMND_EVT *evt,
IMMSV_OM_RSP_SEARCH_NEXT **rspList = NULL;
MDS_DEST implDest = 0LL;
bool retardSync =
-   ((cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) &&
+   ((cb->fevs_replies_pending >= cb->mFevsMaxPending) &&
 cb->mIsCoord && (cb->syncPid > 0));
SaUint32T resultSize = 0;
IMMSV_OM_RSP_SEARCH_BUNDLE_NEXT bundleSearch = {0, NULL};
@@ -2767,10 +2767,10 @@ static uint32_t immnd_evt_proc_admowner_init(IMMND_CB 
*cb, IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting admo_init request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.admInitRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -2895,10 +2895,10 @@ static uint32_t immnd_evt_proc_impl_set(IMMND_CB *cb, 
IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too

[devel] [PATCH 0/2] Review Request for immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260] V2

2021-05-17 Thread Minh Chau

Summary: immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260] V2
Review request for Ticket(s): 3260
Peer Reviewer(s): Thien, Surbhi, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3260
Base revision: 8259816e20e83658f04f0264af19cafa0cdd2755
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision a1c9418e8ad5dbdf66bc360d570add918e227f07
Author: Minh Chau 
Date:   Tue, 18 May 2021 09:45:47 +1000

imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]



revision 7b5acab32796727802b95dc9c846539347e7130d
Author: Minh Chau 
Date:   Tue, 18 May 2021 09:45:38 +1000

immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise



Complete diffstat:
--
 src/imm/agent/imma_oi_api.cc   |  2 +-
 src/imm/agent/imma_om_api.cc   |  2 +-
 src/imm/agent/imma_proc.cc |  5 +++--
 src/imm/common/immsv_api.h |  4 
 src/imm/immd/immd_cb.h |  1 -
 src/imm/immloadd/imm_loader.cc |  2 +-
 src/imm/immnd/ImmModel.cc  |  2 +-
 src/imm/immnd/immnd.conf   |  4 
 src/imm/immnd/immnd_cb.h   |  1 +
 src/imm/immnd/immnd_evt.c  | 41 +++--
 src/imm/immnd/immnd_main.c | 15 +++
 11 files changed, 46 insertions(+), 33 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Ope

[devel] [PATCH 1/2] immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

2021-05-11 Thread Minh Chau

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise
---
 src/imm/common/immsv_api.h |  4 
 src/imm/immnd/ImmModel.cc  |  2 +-
 src/imm/immnd/immnd_cb.h   |  1 +
 src/imm/immnd/immnd_evt.c  | 41 ++
 src/imm/immnd/immnd_main.c | 15 ++
 5 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/src/imm/common/immsv_api.h b/src/imm/common/immsv_api.h
index c37a624ed..f809c8168 100644
--- a/src/imm/common/immsv_api.h
+++ b/src/imm/common/immsv_api.h
@@ -77,10 +77,6 @@ typedef enum {
   ACCESS_CONTROL_ENFORCING = 2
 } OsafImmAccessControlModeT;
 
-/*Max # of outstanding fevs messages towards director.*/
-/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
-
 #define IMMSV_MAX_OBJECTS 1
 #define IMMSV_MAX_ATTRIBUTES 128
 #define IMMSV_MAX_ADMO_NAME_LENGTH 256
diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index 2d750040e..8631dc21f 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -1614,7 +1614,7 @@ SaAisErrorT immModel_nextResult(IMMND_CB* cb, void* 
searchOp,
   TRACE_2(
   "ERR_TRY_AGAIN: Too many pending incoming fevs "
   "messages (> %u) rejecting sync iteration next request",
-  IMMSV_DEFAULT_FEVS_MAX_PENDING);
+  cb->mFevsMaxPending);
   return SA_AIS_ERR_TRY_AGAIN;
 }
 err = ImmModel::instance(>immModel)->nextSyncResult(rsp, *op);
diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index bb3bb8493..bc2d75b05 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -209,6 +209,7 @@ typedef struct immnd_cb_tag {
   NCS_PATRICIA_TREE immnd_clm_list; /* IMMND_IMM_CLIENT_NODE - node */
   tmr_t splitbrain_tmr;
   bool splitbrain_tmr_run;
+  uint8_t mFevsMaxPending; /* Max pending fevs messages towards director */
 } IMMND_CB;
 
 /* CB prototypes */
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 0e801a942..64c5223c4 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -1787,7 +1787,7 @@ static uint32_t immnd_evt_proc_search_next(IMMND_CB *cb, 
IMMND_EVT *evt,
IMMSV_OM_RSP_SEARCH_NEXT **rspList = NULL;
MDS_DEST implDest = 0LL;
bool retardSync =
-   ((cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) &&
+   ((cb->fevs_replies_pending >= cb->mFevsMaxPending) &&
 cb->mIsCoord && (cb->syncPid > 0));
SaUint32T resultSize = 0;
IMMSV_OM_RSP_SEARCH_BUNDLE_NEXT bundleSearch = {0, NULL};
@@ -2767,10 +2767,10 @@ static uint32_t immnd_evt_proc_admowner_init(IMMND_CB 
*cb, IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting admo_init request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.admInitRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -2895,10 +2895,10 @@ static uint32_t immnd_evt_proc_impl_set(IMMND_CB *cb, 
IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting impl_set request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.implSetRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -3061,10 +3061,10 @@ static uint32_t immnd_evt_proc_ccb_init(IMMND_CB *cb, 
IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting ccb_init request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.ccbInitRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -3220,11 +3220,10 @@ static uint32_t immnd_evt_proc_rt_update(IMMND_CB *cb, 
IMMND_EVT *evt,
   writbale.
 */
 
-   if (cb->fevs_replies_pending >=
-   IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs 
messages (> %u) rejecting rt_update

[devel] [PATCH 2/2] imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]

2021-05-11 Thread Minh Chau

---
 src/imm/agent/imma_oi_api.cc | 2 +-
 src/imm/agent/imma_om_api.cc | 2 +-
 src/imm/agent/imma_proc.cc   | 5 +++--
 src/imm/immd/immd_cb.h   | 1 -
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/imm/agent/imma_oi_api.cc b/src/imm/agent/imma_oi_api.cc
index 8e9a1f2b0..c79d0e315 100644
--- a/src/imm/agent/imma_oi_api.cc
+++ b/src/imm/agent/imma_oi_api.cc
@@ -3510,7 +3510,7 @@ int imma_oi_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked,
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmOiHandleT immOiHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
   *locked = false;
diff --git a/src/imm/agent/imma_om_api.cc b/src/imm/agent/imma_om_api.cc
index 55fdcc2d8..f34e7645c 100644
--- a/src/imm/agent/imma_om_api.cc
+++ b/src/imm/agent/imma_om_api.cc
@@ -9761,7 +9761,7 @@ int imma_om_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked) {
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmHandleT immHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
   SaAisErrorT err_resurrect = SA_AIS_OK;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
diff --git a/src/imm/agent/imma_proc.cc b/src/imm/agent/imma_proc.cc
index ea035827f..723550904 100644
--- a/src/imm/agent/imma_proc.cc
+++ b/src/imm/agent/imma_proc.cc
@@ -1570,6 +1570,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
   unsigned int sleep_delay_ms = 500;
   unsigned int max_waiting_time_ms = 2 * 1000; /* 2 secs */
   unsigned int msecs_waited = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   if (m_NCS_LOCK(>cb_lock, NCS_LOCK_WRITE) != NCSCC_RC_SUCCESS) {
 TRACE_3("Lock failure");
@@ -1583,7 +1584,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 cl_node, cl_node ? cl_node->exposed : 0);
 goto failure;
   }
-
+  timeout = cl_node->syncr_timeout;
   if (!cl_node->stale) {
 TRACE_3(
 "imma_proc_resurrect_client: Handle %llx was not stale, "
@@ -1623,7 +1624,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 /* send the request to the IMMND */
 if (imma_mds_msg_sync_send(cb->imma_mds_hdl, &(cb->immnd_mds_dest),
_evt, _evt,
-   IMMSV_WAIT_TIME) != NCSCC_RC_SUCCESS) {
+   timeout) != NCSCC_RC_SUCCESS) {
   TRACE_3("Failure in MDS send");
   goto exposed;
 }
diff --git a/src/imm/immd/immd_cb.h b/src/imm/immd/immd_cb.h
index e9710ff71..51f52aba8 100644
--- a/src/imm/immd/immd_cb.h
+++ b/src/imm/immd/immd_cb.h
@@ -23,7 +23,6 @@
 #include 
 
 #define IMMD_EVT_TIME_OUT 100
-#define IMMSV_WAIT_TIME 100
 
 #define m_IMMND_IS_ON_SCXB(m, n) ((m == n) ? 1 : 0)
 #define m_IMMD_IS_LOCAL_NODE(m, n) (m == n) ? 1 : 0
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/2] Review Request for immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

2021-05-11 Thread Minh Chau

Summary: immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]
Review request for Ticket(s): 3260
Peer Reviewer(s): Thien, Surbhi, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3260
Base revision: 8259816e20e83658f04f0264af19cafa0cdd2755
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 9e93db5df437d64d674f6b96c0ddba2e72677950
Author: Minh Chau 
Date:   Wed, 12 May 2021 10:52:10 +1000

imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]



revision 52ed58179ea6781c9fb69a6cb057c180cec47dee
Author: Minh Chau 
Date:   Wed, 12 May 2021 10:51:53 +1000

immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise



Complete diffstat:
--
 src/imm/agent/imma_oi_api.cc |  2 +-
 src/imm/agent/imma_om_api.cc |  2 +-
 src/imm/agent/imma_proc.cc   |  5 +++--
 src/imm/common/immsv_api.h   |  4 
 src/imm/immd/immd_cb.h   |  1 -
 src/imm/immnd/ImmModel.cc|  2 +-
 src/imm/immnd/immnd_cb.h |  1 +
 src/imm/immnd/immnd_evt.c| 41 +++--
 src/imm/immnd/immnd_main.c   | 15 +++
 9 files changed, 41 insertions(+), 32 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 2/2] imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]

2021-05-11 Thread Minh Chau

---
 src/imm/agent/imma_oi_api.cc | 2 +-
 src/imm/agent/imma_om_api.cc | 2 +-
 src/imm/agent/imma_proc.cc   | 5 +++--
 src/imm/immd/immd_cb.h   | 1 -
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/imm/agent/imma_oi_api.cc b/src/imm/agent/imma_oi_api.cc
index 8e9a1f2b0..c79d0e315 100644
--- a/src/imm/agent/imma_oi_api.cc
+++ b/src/imm/agent/imma_oi_api.cc
@@ -3510,7 +3510,7 @@ int imma_oi_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked,
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmOiHandleT immOiHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
   *locked = false;
diff --git a/src/imm/agent/imma_om_api.cc b/src/imm/agent/imma_om_api.cc
index 55fdcc2d8..f34e7645c 100644
--- a/src/imm/agent/imma_om_api.cc
+++ b/src/imm/agent/imma_om_api.cc
@@ -9761,7 +9761,7 @@ int imma_om_resurrect(IMMA_CB *cb, IMMA_CLIENT_NODE 
*cl_node, bool *locked) {
   osafassert(locked && *locked);
   osafassert(cl_node && cl_node->stale);
   SaImmHandleT immHandle = cl_node->handle;
-  SaTimeT timeout = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
   SaAisErrorT err_resurrect = SA_AIS_OK;
 
   m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
diff --git a/src/imm/agent/imma_proc.cc b/src/imm/agent/imma_proc.cc
index ea035827f..723550904 100644
--- a/src/imm/agent/imma_proc.cc
+++ b/src/imm/agent/imma_proc.cc
@@ -1570,6 +1570,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
   unsigned int sleep_delay_ms = 500;
   unsigned int max_waiting_time_ms = 2 * 1000; /* 2 secs */
   unsigned int msecs_waited = 0;
+  SaTimeT timeout = IMMSV_WAIT_TIME;
 
   if (m_NCS_LOCK(>cb_lock, NCS_LOCK_WRITE) != NCSCC_RC_SUCCESS) {
 TRACE_3("Lock failure");
@@ -1583,7 +1584,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 cl_node, cl_node ? cl_node->exposed : 0);
 goto failure;
   }
-
+  timeout = cl_node->syncr_timeout;
   if (!cl_node->stale) {
 TRACE_3(
 "imma_proc_resurrect_client: Handle %llx was not stale, "
@@ -1623,7 +1624,7 @@ uint32_t imma_proc_resurrect_client(IMMA_CB *cb, 
SaImmHandleT immHandle,
 /* send the request to the IMMND */
 if (imma_mds_msg_sync_send(cb->imma_mds_hdl, &(cb->immnd_mds_dest),
_evt, _evt,
-   IMMSV_WAIT_TIME) != NCSCC_RC_SUCCESS) {
+   timeout) != NCSCC_RC_SUCCESS) {
   TRACE_3("Failure in MDS send");
   goto exposed;
 }
diff --git a/src/imm/immd/immd_cb.h b/src/imm/immd/immd_cb.h
index e9710ff71..51f52aba8 100644
--- a/src/imm/immd/immd_cb.h
+++ b/src/imm/immd/immd_cb.h
@@ -23,7 +23,6 @@
 #include 
 
 #define IMMD_EVT_TIME_OUT 100
-#define IMMSV_WAIT_TIME 100
 
 #define m_IMMND_IS_ON_SCXB(m, n) ((m == n) ? 1 : 0)
 #define m_IMMD_IS_LOCAL_NODE(m, n) (m == n) ? 1 : 0
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/2] immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

2021-05-11 Thread Minh Chau

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise
---
 src/imm/common/immsv_api.h |  4 
 src/imm/immnd/ImmModel.cc  |  2 +-
 src/imm/immnd/immnd_cb.h   |  1 +
 src/imm/immnd/immnd_evt.c  | 41 ++
 src/imm/immnd/immnd_main.c | 13 
 5 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/src/imm/common/immsv_api.h b/src/imm/common/immsv_api.h
index c37a624ed..f809c8168 100644
--- a/src/imm/common/immsv_api.h
+++ b/src/imm/common/immsv_api.h
@@ -77,10 +77,6 @@ typedef enum {
   ACCESS_CONTROL_ENFORCING = 2
 } OsafImmAccessControlModeT;
 
-/*Max # of outstanding fevs messages towards director.*/
-/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
-
 #define IMMSV_MAX_OBJECTS 1
 #define IMMSV_MAX_ATTRIBUTES 128
 #define IMMSV_MAX_ADMO_NAME_LENGTH 256
diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index 2d750040e..8631dc21f 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -1614,7 +1614,7 @@ SaAisErrorT immModel_nextResult(IMMND_CB* cb, void* 
searchOp,
   TRACE_2(
   "ERR_TRY_AGAIN: Too many pending incoming fevs "
   "messages (> %u) rejecting sync iteration next request",
-  IMMSV_DEFAULT_FEVS_MAX_PENDING);
+  cb->mFevsMaxPending);
   return SA_AIS_ERR_TRY_AGAIN;
 }
 err = ImmModel::instance(>immModel)->nextSyncResult(rsp, *op);
diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index bb3bb8493..bc2d75b05 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -209,6 +209,7 @@ typedef struct immnd_cb_tag {
   NCS_PATRICIA_TREE immnd_clm_list; /* IMMND_IMM_CLIENT_NODE - node */
   tmr_t splitbrain_tmr;
   bool splitbrain_tmr_run;
+  uint8_t mFevsMaxPending; /* Max pending fevs messages towards director */
 } IMMND_CB;
 
 /* CB prototypes */
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 0e801a942..64c5223c4 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -1787,7 +1787,7 @@ static uint32_t immnd_evt_proc_search_next(IMMND_CB *cb, 
IMMND_EVT *evt,
IMMSV_OM_RSP_SEARCH_NEXT **rspList = NULL;
MDS_DEST implDest = 0LL;
bool retardSync =
-   ((cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) &&
+   ((cb->fevs_replies_pending >= cb->mFevsMaxPending) &&
 cb->mIsCoord && (cb->syncPid > 0));
SaUint32T resultSize = 0;
IMMSV_OM_RSP_SEARCH_BUNDLE_NEXT bundleSearch = {0, NULL};
@@ -2767,10 +2767,10 @@ static uint32_t immnd_evt_proc_admowner_init(IMMND_CB 
*cb, IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting admo_init request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.admInitRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -2895,10 +2895,10 @@ static uint32_t immnd_evt_proc_impl_set(IMMND_CB *cb, 
IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting impl_set request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.implSetRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -3061,10 +3061,10 @@ static uint32_t immnd_evt_proc_ccb_init(IMMND_CB *cb, 
IMMND_EVT *evt,
goto agent_rsp;
}
 
-   if (cb->fevs_replies_pending >= IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs messages (> 
%u) rejecting ccb_init request",
-   IMMSV_DEFAULT_FEVS_MAX_PENDING);
+   cb->mFevsMaxPending);
send_evt.info.imma.info.ccbInitRsp.error = SA_AIS_ERR_TRY_AGAIN;
goto agent_rsp;
}
@@ -3220,11 +3220,10 @@ static uint32_t immnd_evt_proc_rt_update(IMMND_CB *cb, 
IMMND_EVT *evt,
   writbale.
 */
 
-   if (cb->fevs_replies_pending >=
-   IMMSV_DEFAULT_FEVS_MAX_PENDING) {
+   if (cb->fevs_replies_pending >= cb->mFevsMaxPending) {
TRACE_2(
"ERR_TRY_AGAIN: Too many pending incoming fevs 
messages (> %u) rejecting rt_update

[devel] [PATCH 0/2] Review Request for Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

2021-05-11 Thread Minh Chau

Summary: immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]
Review request for Ticket(s): 3260
Peer Reviewer(s): Thien, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3260
Base revision: 8259816e20e83658f04f0264af19cafa0cdd2755
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 68f3afd33cf629950ec8388b29ddfca4f3970cb3
Author: Minh Chau 
Date:   Wed, 12 May 2021 10:23:13 +1000

imma: Correctly use IMMA_SYNCR_TIMEOUT [#3260]



revision 36f43df693bf8461b8be28d721efcad1f1864fe4
Author: Minh Chau 
Date:   Wed, 12 May 2021 10:23:05 +1000

immnd: Make IMMSV_FEVS_MAX_PENDING environment variable [#3260]

Immnd allows IMMSV_FEVS_MAX_PENDING sourced from enviroment
variable, or uses default value (16) otherwise



Complete diffstat:
--
 src/imm/agent/imma_oi_api.cc |  2 +-
 src/imm/agent/imma_om_api.cc |  2 +-
 src/imm/agent/imma_proc.cc   |  5 +++--
 src/imm/common/immsv_api.h   |  4 
 src/imm/immd/immd_cb.h   |  1 -
 src/imm/immnd/ImmModel.cc|  2 +-
 src/imm/immnd/immnd_cb.h |  1 +
 src/imm/immnd/immnd_evt.c| 41 +++--
 src/imm/immnd/immnd_main.c   | 13 +
 9 files changed, 39 insertions(+), 32 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: Improve etcd plugin to be tolerant of new etcd leader election [#3226]

2021-01-19 Thread Minh Chau

Summary: osaf: Improve etcd plugin to be tolerant of new etcd leader election 
[#3226]
Review request for Ticket(s): 3226
Peer Reviewer(s): Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3226
Base revision: 2c13c9ea579dc064b1d6adcce98d62efb3d0032d
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 91d67f10553661b9af59d62649f89a4f14e47efd
Author: Minh Chau 
Date:   Wed, 20 Jan 2021 07:47:19 +1100

osaf: Improve etcd plugin to be tolerant of new etcd leader election [#3226]

In the event of network partitioning that results in new etcd leader
change, the 'get' api in the bigger partition is not available for a
few seconds. Therefore, the SC in bigger partition can not promote
but self-fence instead.

This patch adds etcd_tolerance_timeout so the SC in bigger partition
can retry the promotion. However, the SC meanwhile in the smaller
partiton also shares the same etcd_tolerance_timeout retries, hence
the etcd_tolerance_timeout delays the self-fence of SC in smaller
partition. The patch therefore checks the healthiness of self endpoint
where the SC should apply the etcd_tolerance_timeout retries.



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 44 +++--
 1 file changed, 26 insertions(+), 18 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing li

[devel] [PATCH 1/1] osaf: Improve etcd plugin to be tolerant of new etcd leader election [#3226]

2021-01-19 Thread Minh Chau

In the event of network partitioning that results in new etcd leader
change, the 'get' api in the bigger partition is not available for a
few seconds. Therefore, the SC in bigger partition can not promote
but self-fence instead.

This patch adds etcd_tolerance_timeout so the SC in bigger partition
can retry the promotion. However, the SC meanwhile in the smaller
partiton also shares the same etcd_tolerance_timeout retries, hence
the etcd_tolerance_timeout delays the self-fence of SC in smaller
partition. The patch therefore checks the healthiness of self endpoint
where the SC should apply the etcd_tolerance_timeout retries.
---
 src/osaf/consensus/plugins/etcd3.plugin | 44 +++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index 6252eedcb..34a975e05 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -23,6 +23,7 @@ readonly directory="/opensaf/"
 readonly etcd_options=""
 readonly etcd_timeout="3s"
 readonly heartbeat_interval=2
+readonly etcd_tolerance_timeout=6
 
 export ETCDCTL_API=3
 
@@ -332,11 +333,10 @@ unlock() {
 #   non-zero - failure
 watch() {
   readonly watch_key="$1"
-
   # get baseline
   orig_value=$(get "$watch_key")
   result=$?
-
+  tol_counter=0
   if [ "$result" -le 1 ]; then
   if [ "$result" -eq 0 ] && [ "$watch_key" == "$takeover_request" ]; then
 state=$(echo $orig_value | awk '{print $4}')
@@ -353,25 +353,33 @@ watch() {
   current_value=$(get "$watch_key")
   result=$?
   if [ "$result" -gt 1 ]; then
-# etcd down?
-if [ "$watch_key" == "$takeover_request" ]; then
-  hostname=`cat $node_name_file`
-  echo "$hostname SC-0 1000 UNDEFINED"
-  return 0
-else
-  return 1
+# etcd down?, check the healthiness of self endpoint
+$(etcdctl endpoint health >/dev/null 2>&1)
+is_healthy=$?
+((tol_counter=tol_counter+heartbeat_interval))
+if [ $tol_counter -ge $etcd_tolerance_timeout ] || [ $is_healthy -ne 0 
]; then
+  if [ "$watch_key" == "$takeover_request" ]; then
+hostname=`cat $node_name_file`
+echo "$hostname SC-0 1000 UNDEFINED"
+return 0
+  else
+return 1
+  fi
 fi
-  elif [ "$orig_value" != "$current_value" ]; then
-if [ "$watch_key" == "$takeover_request" ]; then
-  state=$(echo $orig_value | awk '{print $4}')
-  if [ "$state" == "REJECTED" ] && [ -z "$current_value" ]; then
-# value is cleared after lease time, keep watching
-orig_value=""
-continue
+  else
+tol_counter=0
+if [ "$orig_value" != "$current_value" ]; then
+  if [ "$watch_key" == "$takeover_request" ]; then
+state=$(echo $orig_value | awk '{print $4}')
+if [ "$state" == "REJECTED" ] && [ -z "$current_value" ]; then
+  # value is cleared after lease time, keep watching
+  orig_value=""
+  continue
+fi
   fi
+  echo $current_value
+  return 0
 fi
-echo $current_value
-return 0
   fi
 done
   else
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

2020-10-27 Thread Minh Chau

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.
---
 src/base/sysf_exc_scr.c | 121 
 1 file changed, 60 insertions(+), 61 deletions(-)

diff --git a/src/base/sysf_exc_scr.c b/src/base/sysf_exc_scr.c
index 378b1eeab..119f72478 100644
--- a/src/base/sysf_exc_scr.c
+++ b/src/base/sysf_exc_scr.c
@@ -33,10 +33,11 @@
 #include "base/sysf_exc_scr.h"
 #include "base/ncssysf_def.h"
 
+#include 
 #include 
 
 SYSF_EXECUTE_MODULE_CB module_cb;
-
+static struct pollfd fds[1];
 /*
 
   PROCEDURE: ncs_exc_mdl_start_timer
@@ -108,8 +109,19 @@ void ncs_exec_module_signal_hdlr(int signal)
 
/*  printf("\n In  SIGCHLD Handler \n"); */
 
-   if (-1 == write(module_cb.write_fd, (const void *),
+   while (-1 == write(module_cb.write_fd, (const void *),
sizeof(EXEC_MOD_INFO))) {
+   /* Only continue if the error is EINTR which may be
+* caused by the signal interupt, and do not try again
+* with EAGAIN and EWOULDBLOCK since that will become
+* the reason to cause the threads hanging with
+* BLOCKING socketpair and the ncs_exec_mod_hdlr scans
+* all child pid for each read()
+*/
+   if (errno == EINTR)
+   continue;
+
+   break;
perror("ncs_exec_module_signal_hdlr: write");
}
}
@@ -137,11 +149,7 @@ void ncs_exec_module_timer_hdlr(void *uarg)
EXEC_MOD_INFO info = {.pid = NCS_PTR_TO_INT32_CAST(uarg),
  .status = 0,
  .type = SYSF_EXEC_INFO_TIME_OUT};
-
-   if (-1 == write(module_cb.write_fd, (const void *),
-   sizeof(EXEC_MOD_INFO))) {
-   perror("ncs_exec_module_timer_hdlr: write");
-   }
+   give_exec_mod_cb(info.pid, info.status, info.type);
 }
 
 /**\
@@ -169,8 +177,25 @@ void ncs_exec_mod_hdlr(void)
SYSF_PID_LIST *exec_pid = NULL;
int status = -1;
int pid = -1;
+   int polltmo = -1;
+
+   fds[0].fd = module_cb.read_fd;
+   fds[0].events = POLLIN;
 
while (1) {
+   int pollretval = poll(fds, 1, polltmo);
+
+   if (pollretval == -1) {
+   if (errno == EINTR)
+   continue;
+
+   LOG_ER("ncs_exec_mod_hdlr: poll FAILED - %s",
+   strerror(errno));
+   break;
+   }
+   if ((fds[0].revents & POLLIN) == false)
+   continue;
+
while ((ret_val = read(
module_cb.read_fd, (((uint8_t *)) + count),
(maxsize - count))) != (maxsize - count)) {
@@ -178,66 +203,40 @@ void ncs_exec_mod_hdlr(void)
if (errno == EBADF)
return;
 
-   perror("ncs_exec_mod_hdlr: read fail:");
continue;
}
count += ret_val;
} /* while */
 
-   if (info.type == SYSF_EXEC_INFO_TIME_OUT) {
-   /* printf("Time out signal \n"); */
-   pid = info.pid;
-   give_exec_mod_cb(info.pid, info.status, info.type);
-
-   } /* if */
-   else {
repeat_srch_from_beginning:
-   m_NCS_LOCK(_cb.tree_lock, NCS_LOCK_WRITE);
-
-   for (exec_pid =
-(SYSF_PID_LIST *)ncs_patricia_tree_getnext(
-_cb.pid_list, NULL);
-exec_pid != NULL;
-exec_pid =
-(SYSF_PID_LIST *)ncs_patricia_tree_getnext(
-_cb.pid_list,
-(const uint8_t *)_pid->pid)) {

[devel] [PATCH 0/1] Review Request for base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

2020-10-27 Thread Minh Chau

Summary: base: Use non-blocking socketpair in sysf_exc module V3 [#3222]
Review request for Ticket(s): 3222
Peer Reviewer(s): Thuan, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3222
Base revision: 17038f9f9bbbde98b68fccb5b65413e14fe46418
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 8758c96eaf3d62ec99b99a7ae8d3ebf6884793c1
Author: Minh Chau 
Date:   Wed, 28 Oct 2020 07:36:38 +1100

base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.



Complete diffstat:
--
 src/base/sysf_exc_scr.c | 121 
 1 file changed, 60 insertions(+), 61 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxy

[devel] [PATCH 0/1] Review Request for base: Use non-blocking socketpair in sysf_exc module V2 [#3222]

2020-10-25 Thread Minh Chau

Summary: base: Use non-blocking socketpair in sysf_exc module V2 [#3222]
Review request for Ticket(s): 3222
Peer Reviewer(s): Thuan, Thang 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3222
Base revision: 17038f9f9bbbde98b68fccb5b65413e14fe46418
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 5c00f24a7c904287fc2ae96a7070d35d362a5516
Author: Minh Chau 
Date:   Mon, 26 Oct 2020 13:20:23 +1100

base: Use non-blocking socketpair in sysf_exc module V2 [#3222]

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.



Complete diffstat:
--
 src/base/sysf_exc_scr.c | 51 ++---
 1 file changed, 44 insertions(+), 7 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-dev

[devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V2 [#3222]

2020-10-25 Thread Minh Chau

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.
---
 src/base/sysf_exc_scr.c | 51 +++--
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/base/sysf_exc_scr.c b/src/base/sysf_exc_scr.c
index 378b1eeab..ed4123a8c 100644
--- a/src/base/sysf_exc_scr.c
+++ b/src/base/sysf_exc_scr.c
@@ -33,10 +33,11 @@
 #include "base/sysf_exc_scr.h"
 #include "base/ncssysf_def.h"
 
+#include 
 #include 
 
 SYSF_EXECUTE_MODULE_CB module_cb;
-
+static struct pollfd fds[1];
 /*
 
   PROCEDURE: ncs_exc_mdl_start_timer
@@ -108,8 +109,18 @@ void ncs_exec_module_signal_hdlr(int signal)
 
/*  printf("\n In  SIGCHLD Handler \n"); */
 
-   if (-1 == write(module_cb.write_fd, (const void *),
+   while (-1 == write(module_cb.write_fd, (const void *),
sizeof(EXEC_MOD_INFO))) {
+   /* Only continue if the error is EINTR which may be
+* caused by the signal interupt, and do not try again
+* with EAGAIN and EWOULDBLOCK since that will become
+* the reason to cause the threads hanging with
+* BLOCKING socketpair
+*/
+   if (errno == EINTR)
+   continue;
+
+   break;
perror("ncs_exec_module_signal_hdlr: write");
}
}
@@ -138,9 +149,19 @@ void ncs_exec_module_timer_hdlr(void *uarg)
  .status = 0,
  .type = SYSF_EXEC_INFO_TIME_OUT};
 
-   if (-1 == write(module_cb.write_fd, (const void *),
-   sizeof(EXEC_MOD_INFO))) {
-   perror("ncs_exec_module_timer_hdlr: write");
+   while (-1 == write(module_cb.write_fd, (const void *),
+   sizeof(EXEC_MOD_INFO))) {
+   /* Only continue if the error is EINTR which may be
+* caused by the signal interupt, and do not try again
+* with EAGAIN and EWOULDBLOCK since that will become
+* the reason to cause the threads hanging with
+* BLOCKING socketpair
+*/
+   if (errno == EINTR)
+   continue;
+
+   break;
+   perror("ncs_exec_module_timer_hdlr: write");
}
 }
 
@@ -169,8 +190,25 @@ void ncs_exec_mod_hdlr(void)
SYSF_PID_LIST *exec_pid = NULL;
int status = -1;
int pid = -1;
+   int polltmo = -1;
+
+   fds[0].fd = module_cb.read_fd;
+   fds[0].events = POLLIN;
 
while (1) {
+   int pollretval = poll(fds, 1, polltmo);
+
+   if (pollretval == -1) {
+   if (errno == EINTR)
+   continue;
+
+   LOG_ER("ncs_exec_mod_hdlr: poll FAILED - %s",
+   strerror(errno));
+   break;
+   }
+   if ((fds[0].revents & POLLIN) == false)
+   continue;
+
while ((ret_val = read(
module_cb.read_fd, (((uint8_t *)) + count),
(maxsize - count))) != (maxsize - count)) {
@@ -178,7 +216,6 @@ void ncs_exec_mod_hdlr(void)
if (errno == EBADF)
return;
 
-   perror("ncs_exec_mod_hdlr: read fail:");
continue;
}
count += ret_val;
@@ -430,7 +467,7 @@ uint32_t start_exec_mod_cb(void)
return m_LEAP_DBG_SINK(NCSCC_RC_FAILURE);
}
 
-   if (0 != socketpair(AF_UNIX, SOCK_DGRAM, 0, spair)) {
+   if (0 != socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, spair)) {
perror("init_exec_mod_cb: socketpair: ");
return m_LEAP_DBG_SINK(NCSCC_RC_FAILURE);
}
-- 
2.20.1



___
Opensaf-devel mailing list

[devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module [#3222]

2020-10-21 Thread Minh Chau

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.
---
 src/base/sysf_exc_scr.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/base/sysf_exc_scr.c b/src/base/sysf_exc_scr.c
index 378b1eeab..6348985cb 100644
--- a/src/base/sysf_exc_scr.c
+++ b/src/base/sysf_exc_scr.c
@@ -33,10 +33,11 @@
 #include "base/sysf_exc_scr.h"
 #include "base/ncssysf_def.h"
 
+#include 
 #include 
 
 SYSF_EXECUTE_MODULE_CB module_cb;
-
+static struct pollfd fds[1];
 /*
 
   PROCEDURE: ncs_exc_mdl_start_timer
@@ -169,8 +170,20 @@ void ncs_exec_mod_hdlr(void)
SYSF_PID_LIST *exec_pid = NULL;
int status = -1;
int pid = -1;
+   int polltmo = -1;
+
+   fds[0].fd = module_cb.read_fd;
+   fds[0].events = POLLIN;
 
while (1) {
+   int pollretval = poll(fds, 1, polltmo);
+   if (pollretval == -1) {
+   if (errno == EINTR) continue;
+   LOG_ER("ncs_exec_mod_hdlr: poll FAILED - %s",
+   strerror(errno));
+   break;
+   }
+   if ((fds[0].revents & POLLIN) == false) continue;
while ((ret_val = read(
module_cb.read_fd, (((uint8_t *)) + count),
(maxsize - count))) != (maxsize - count)) {
@@ -430,7 +443,7 @@ uint32_t start_exec_mod_cb(void)
return m_LEAP_DBG_SINK(NCSCC_RC_FAILURE);
}
 
-   if (0 != socketpair(AF_UNIX, SOCK_DGRAM, 0, spair)) {
+   if (0 != socketpair(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0, spair)) {
perror("init_exec_mod_cb: socketpair: ");
return m_LEAP_DBG_SINK(NCSCC_RC_FAILURE);
}
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for base: Use non-blocking socketpair in sysf_exc module [#3222]

2020-10-21 Thread Minh Chau

Summary: base: Use non-blocking socketpair in sysf_exc module [#3222]
Review request for Ticket(s): 3222
Peer Reviewer(s): Thuan, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3222
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision f21f2a82fe0fed3db61e7bf89a16cf84dd5121a4
Author: Minh Chau 
Date:   Wed, 21 Oct 2020 16:19:23 +1100

base: Use non-blocking socketpair in sysf_exc module [#3222]

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.



Complete diffstat:
--
 src/base/sysf_exc_scr.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)


Testing Commands:
-
legacy amf tests
test with termination of 800 amfnd comps/per su at once

Testing, Expected Results:
--
all tests shall pass


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge

[devel] [PATCH 0/1] Review Request for fmd: Do not send RDE to set active role if opensaf_quick_reboot is executed [#3146]

2020-01-23 Thread Minh Chau

Summary: fmd: Do not send RDE to set active role if opensaf_quick_reboot is 
executed [#3146]
Review request for Ticket(s): 3146
Peer Reviewer(s): Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3146
Base revision: f03fe23c17bd4e4e32dd4a1304d2ac8f247d05e7
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6afce8c0b7f827d5186f8a64f6229a10329d1313
Author: Minh Chau 
Date:   Fri, 24 Jan 2020 10:57:03 +1100

fmd: Do not send RDE to set active role if opensaf_quick_reboot is executed 
[#3146]

If a SC is separated from cluster, fmd calls opensaf_quick_reboot().
The reboot script returns yet the node has not been coming down.
In the code after opensaf_quick_reboot(), fmd tells rde to promote
to active. Hence, there is a short period of having two 2 active SC

This patch makes fmd to stop sending to RDE to set active role after
opensaf_quick_reboot().

Note: There are a few places after opensaf_quick_reboot(), the function
does not return. However, this patch only fixes the issue in fm, the
other places will be re-visited.



Complete diffstat:
--
 src/fm/fmd/fm_rda.cc | 2 ++
 1 file changed, 2 insertions(+)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] fmd: Do not send RDE to set active role if opensaf_quick_reboot is executed [#3146]

2020-01-23 Thread Minh Chau

If a SC is separated from cluster, fmd calls opensaf_quick_reboot().
The reboot script returns yet the node has not been coming down.
In the code after opensaf_quick_reboot(), fmd tells rde to promote
to active. Hence, there is a short period of having two 2 active SC

This patch makes fmd to stop sending to RDE to set active role after
opensaf_quick_reboot().

Note: There are a few places after opensaf_quick_reboot(), the function
does not return. However, this patch only fixes the issue in fm, the
other places will be re-visited.
---
 src/fm/fmd/fm_rda.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index fca417f79..479eb2149 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -86,6 +86,7 @@ void promote_node(FM_CB *fm_cb) {
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller "
   "in consensus service");
+return;
   } else if (rc == SA_AIS_ERR_EXIST) {
 // @todo if we don't reboot, we don't seem to recover from this. Can we
 // improve?
@@ -94,6 +95,7 @@ void promote_node(FM_CB *fm_cb) {
 "cluster?");
 opensaf_quick_reboot("A controller is already active. We were separated "
  "from the cluster?");
+return;
   }
 
   PCS_RDA_REQ rda_req;
-- 
2.20.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for rde: Reboot node if another active controller is detected [#3142]

2020-01-15 Thread Minh Chau

Summary: rde: Reboot node if another active controller is detected [#3142]
Review request for Ticket(s): 3142
Peer Reviewer(s): Hans, Gary, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3142
Base revision: 740100f2ebfb5458a8052dea29b5583b3dc8df5a
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6ba9d0043791bf2603b3d6a44e7af11e4011c0e4
Author: Minh Chau 
Date:   Thu, 16 Jan 2020 10:43:13 +1100

rde: Reboot node if another active controller is detected [#3142]



Complete diffstat:
--
 src/rde/rded/role.cc | 1 +
 1 file changed, 1 insertion(+)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] rde: Reboot node if another active controller is detected [#3142]

2020-01-15 Thread Minh Chau

---
 src/rde/rded/role.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index b890117..9446ccb 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -107,6 +107,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
   rc = consensus_service.PromoteThisNode(true, cluster_size);
   if (rc == SA_AIS_ERR_EXIST) {
 LOG_WA("Another controller is already active");
+opensaf_quick_reboot("Another controller is already active");
 return;
   } else if (rc != SA_AIS_OK && relaxed_mode == true) {
 LOG_WA("Unable to set active controller in consensus service");
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/2] mds: Improve readibility [#3089]

2019-11-28 Thread Minh Chau

Correct indent and reduce code lines (<80 chars) for
mds_mdtm_send_tipc() and mdtm_frag_and_send()
---
 src/mds/mds_dt_tipc.c | 490 ++
 1 file changed, 256 insertions(+), 234 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fdf0da7..722076f 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -2561,16 +2561,16 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
   send message
 */
uint32_t status = 0;
-   uint32_t sum_mds_hdr_plus_mdtm_hdr_plus_len;
+   uint32_t mds_and_mdtm_hdr_len;
uint16_t fctrl_seq_num = 0;
int version = req->msg_arch_word & 0x7;
if (version > 1) {
-   sum_mds_hdr_plus_mdtm_hdr_plus_len =
+   mds_and_mdtm_hdr_len =
(SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN +
 gl_mds_mcm_cb->node_name_len);
} else {
/* sending message to Old version Node  */
-   sum_mds_hdr_plus_mdtm_hdr_plus_len =
+   mds_and_mdtm_hdr_len =
(SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN - 1);
}
 
@@ -2598,13 +2598,13 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
/* This is exclusively for the Bcast ENC and ENC_FLAT case */
if (recv.msg.encoding == MDS_ENC_TYPE_FULL) {
ncs_dec_init_space(_uba,
-  recv.msg.data.fullenc_uba.start);
+   recv.msg.data.fullenc_uba.start);
recv.msg_arch_word = req->msg_arch_word;
} else if (recv.msg.encoding == MDS_ENC_TYPE_FLAT) {
/* This case will not arise, but just to be on safe side
 */
ncs_dec_init_space(_uba,
-  recv.msg.data.flat_uba.start);
+   recv.msg.data.flat_uba.start);
} else {
/* Do nothing for the DIrect buff and Copy case */
}
@@ -2620,19 +2620,18 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
uint32_t frag_seq_num = 0, node_status = 0;
 
node_status = m_MDS_CHECK_NCS_NODE_ID_RANGE(
-   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
+   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
 
if (NCSCC_RC_SUCCESS == node_status) {
tipc_id.node = m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(
-   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
+   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
tipc_id.ref = (uint32_t)(req->adest);
} else {
-   if (req->snd_type !=
-   MDS_SENDTYPE_ACK) { /* This check is becoz in ack
-  cases we are only sending the
-  hdr and no data part is being
-  send, so no message free ,
-  fix me */
+   if (req->snd_type != MDS_SENDTYPE_ACK) {
+   /* This check is becoz in ack cases we are only
+* sending the hdr and no data part is being
+*  send, so no message free. fix me
+*/
mdtm_free_reassem_msg_mem(>msg);
}
return NCSCC_RC_FAILURE;
@@ -2643,43 +2642,45 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
/* Only for the ack and not for any other message */
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
-   uint8_t len = sum_mds_hdr_plus_mdtm_hdr_plus_len;
+   uint8_t len = mds_and_mdtm_hdr_len;
uint8_t buffer_ack[len];
 
/* Add mds_hdr */
-   if (NCSCC_RC_SUCCESS !=
-   mdtm_add_mds_hdr(buffer_ack, req)) {
+   if (mdtm_add_mds_hdr(buffer_ack, req)
+   != NCSCC_RC_SUCCESS) {
+   return NCSCC_RC_FAILURE;
+   }
+   /* if sndqueue is capable, then obtain the current
+* sending seq
+*/
+   if (mds_tipc_fctrl_sndqueue_capable(tipc_id,
+   _seq_num) == NCSCC_RC_FAILURE){
+   m_MDS_LOG_ERR("FCTRL: Failed to send message"
+   " len :%d", len);
return NCSCC_RC_FAILURE;
}
-

[devel] [PATCH 0/2] Review Request for mds: Avoid message reallocation [#3089] V3

2019-11-28 Thread Minh Chau

Summary: mds: Avoid message reallocation [#3089]
Review request for Ticket(s): 3089
Peer Reviewer(s): Thuan, Vu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3089
Base revision: 8e07c19aed63c249f4e7fa8470270d2de1a56046
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision d3bdf53e99523785cdc932d62b25267ea900c643
Author: Minh Chau 
Date:   Thu, 28 Nov 2019 21:08:50 +1100

mds: Avoid message reallocation [#3089]

The patch avoids message reallocation if the message is in
retransmission queue



revision 7be0f5404ebb8ec5b8752813899d6aefd1ef6c33
Author: Minh Chau 
Date:   Thu, 28 Nov 2019 21:08:38 +1100

mds: Improve readibility [#3089]

Correct indent and reduce code lines (<80 chars) for
mds_mdtm_send_tipc() and mdtm_frag_and_send()



Complete diffstat:
--
 src/mds/mds_dt_tipc.c| 534 +--
 src/mds/mds_tipc_fctrl_intf.cc   |   6 +-
 src/mds/mds_tipc_fctrl_intf.h|   4 +-
 src/mds/mds_tipc_fctrl_msg.cc|   2 +-
 src/mds/mds_tipc_fctrl_portid.cc |   9 +-
 5 files changed, 294 insertions(+), 261 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 2/2] mds: Avoid message reallocation [#3089]

2019-11-28 Thread Minh Chau

The patch avoids message reallocation if the message is in
retransmission queue
---
 src/mds/mds_dt_tipc.c| 68 +++-
 src/mds/mds_tipc_fctrl_intf.cc   |  6 ++--
 src/mds/mds_tipc_fctrl_intf.h|  4 +--
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 ++
 5 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 722076f..3d4f468 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -120,7 +120,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req);
 
 /* Tipc actual send, can be made as Macro even*/
 static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len,
-   struct tipc_portid tipc_id);
+   struct tipc_portid tipc_id, uint8_t *is_queued);
 static uint32_t mdtm_mcast_sendto(void *buffer, size_t size,
  const MDTM_SEND_REQ *req);
 
@@ -2643,7 +2643,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
uint8_t len = mds_and_mdtm_hdr_len;
-   uint8_t buffer_ack[len];
+   uint8_t *buffer_ack = calloc(1, len);
+   uint8_t is_queued = 0;
 
/* Add mds_hdr */
if (mdtm_add_mds_hdr(buffer_ack, req)
@@ -2657,18 +2658,24 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
_seq_num) == NCSCC_RC_FAILURE){
m_MDS_LOG_ERR("FCTRL: Failed to send message"
" len :%d", len);
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
/* Add frag_hdr */
if (mdtm_add_frag_hdr(buffer_ack, len, frag_seq_num,
0, fctrl_seq_num) != NCSCC_RC_SUCCESS) {
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
 
m_MDS_LOG_DBG("MDTM:Sending message with Service"
" Seqno=%d, TO Dest_Tipc_id=<0x%08x:%u> ",
req->svc_seq_num, tipc_id.node, tipc_id.ref);
-   return mdtm_sendto(buffer_ack, len, tipc_id);
+   status = mdtm_sendto(buffer_ack, len, tipc_id,
+   _queued);
+   if (is_queued == 0)
+   free(buffer_ack);
+   return status;
}
 
if (req->msg.encoding == MDS_ENC_TYPE_FLAT) {
@@ -2730,6 +2737,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
} else {
uint8_t *p8;
uint8_t *body = NULL;
+   uint8_t is_queued = 0;
 
body = calloc(1, len +
mds_and_mdtm_hdr_len);
@@ -2806,8 +2814,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
req->dest_svc_id);
return NCSCC_RC_FAILURE;
}
-   if (mdtm_mcast_sendto(body, len, req)
-   != NCSCC_RC_SUCCESS) {
+   status = mdtm_mcast_sendto(body, len, 
req);
+   if (status != NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("MDTM: Failed to"
" send Multicast"
" message Data 
lenght=%d"
@@ -2819,24 +2827,20 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)

get_svc_names(req->dest_svc_id),
req->dest_svc_id,
strerror(errno));
-   m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
-   return NCSCC_RC_FAILURE;
}
} else {
-   if (mdtm_sendto(body, len, tipc_id)
-   != NCSCC_RC_SUCCESS) {
+   status = mdtm_sendto(body, len,
+   tipc_id, _queued);
+   if (status !=

[devel] [PATCH 0/2] Review Request for mds: Avoid message reallocation V2 [#3089]

2019-11-26 Thread Minh Chau

Summary: mds: Avoid message reallocation V2 [#3089]
Review request for Ticket(s): 3089
Peer Reviewer(s): Thuan, Vu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3089
Base revision: b61bee5c8accd79e573ef726d40b945afc7c7b3e
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 91603a9a4fa2b6376e3545b4eac8e331be1ab128
Author: Minh Chau 
Date:   Tue, 26 Nov 2019 22:29:32 +1100

mds: Avoid message reallocation [#3089]

The patch avoids message reallocation if the message is in
retransmission queue



revision 78106fdad4e80f1ee8c8e50e1eb0aad94dc293d5
Author: Minh Chau 
Date:   Tue, 26 Nov 2019 22:29:16 +1100

mds: Improve readibility [#3089]

Correct indent and reduce code lines (<80 chars) for
mds_mdtm_send_tipc() and mdtm_frag_and_send()



Complete diffstat:
--
 src/mds/mds_dt_tipc.c| 516 +--
 src/mds/mds_tipc_fctrl_intf.cc   |   6 +-
 src/mds/mds_tipc_fctrl_intf.h|   4 +-
 src/mds/mds_tipc_fctrl_msg.cc|   2 +-
 src/mds/mds_tipc_fctrl_portid.cc |   9 +-
 5 files changed, 288 insertions(+), 249 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 2/2] mds: Avoid message reallocation [#3089]

2019-11-26 Thread Minh Chau

The patch avoids message reallocation if the message is in
retransmission queue
---
 src/mds/mds_dt_tipc.c| 42 +++-
 src/mds/mds_tipc_fctrl_intf.cc   |  6 --
 src/mds/mds_tipc_fctrl_intf.h|  4 ++--
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 +++--
 5 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 16cf11b..866c370 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -120,7 +120,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req);
 
 /* Tipc actual send, can be made as Macro even*/
 static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len,
-   struct tipc_portid tipc_id);
+   struct tipc_portid tipc_id, uint8_t *is_queued);
 static uint32_t mdtm_mcast_sendto(void *buffer, size_t size,
  const MDTM_SEND_REQ *req);
 
@@ -2643,7 +2643,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
uint8_t len = mds_and_mdtm_hdr_len;
-   uint8_t buffer_ack[len];
+   uint8_t *buffer_ack = calloc(1, len);
+   uint8_t is_queued = 0;
 
/* Add mds_hdr */
if (mdtm_add_mds_hdr(buffer_ack, req)
@@ -2657,18 +2658,24 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
_seq_num) == NCSCC_RC_FAILURE){
m_MDS_LOG_ERR("FCTRL: Failed to send message"
" len :%d", len);
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
/* Add frag_hdr */
if (mdtm_add_frag_hdr(buffer_ack, len, frag_seq_num,
0, fctrl_seq_num) != NCSCC_RC_SUCCESS) {
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
 
m_MDS_LOG_DBG("MDTM:Sending message with Service"
" Seqno=%d, TO Dest_Tipc_id=<0x%08x:%u> ",
req->svc_seq_num, tipc_id.node, tipc_id.ref);
-   return mdtm_sendto(buffer_ack, len, tipc_id);
+   status = mdtm_sendto(buffer_ack, len, tipc_id,
+   _queued);
+   if (is_queued == 0)
+   free(buffer_ack);
+   return status;
}
 
if (req->msg.encoding == MDS_ENC_TYPE_FLAT) {
@@ -2730,6 +2737,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
} else {
uint8_t *p8;
uint8_t *body = NULL;
+   uint8_t is_queued = 0;
 
body = calloc(1, len +
mds_and_mdtm_hdr_len);
@@ -2824,7 +2832,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
return NCSCC_RC_FAILURE;
}
} else {
-   if (mdtm_sendto(body, len, tipc_id)
+   if (mdtm_sendto(body, len, tipc_id, 
_queued)
!= NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("MDTM: Unable to"
" send the msg thru"
@@ -2835,7 +2843,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
}
}
m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
+   if (is_queued == 0)
+   free(body);
return NCSCC_RC_SUCCESS;
}
} break;
@@ -2864,6 +2873,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
 
body = calloc(1, (req->msg.data.buff_info.len
+ mds_and_mdtm_hdr_len));
+   uint8_t is_queued = 0;
 
if (mdtm_add_mds_hdr(body, req) != NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("MDTM: Unable to add the mds Hdr"
@@ -2907,7 +2917,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
 
if (mdtm_sendto(body,
(req->msg.data.buff_info.len +
-mds_and_mdtm_hdr_len), tipc_id)
+

[devel] [PATCH 1/2] mds: Improve readibility [#3089]

2019-11-26 Thread Minh Chau

Correct indent and reduce code lines (<80 chars) for
mds_mdtm_send_tipc() and mdtm_frag_and_send()
---
 src/mds/mds_dt_tipc.c | 484 ++
 1 file changed, 254 insertions(+), 230 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fdf0da7..16cf11b 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -2561,16 +2561,16 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
   send message
 */
uint32_t status = 0;
-   uint32_t sum_mds_hdr_plus_mdtm_hdr_plus_len;
+   uint32_t mds_and_mdtm_hdr_len;
uint16_t fctrl_seq_num = 0;
int version = req->msg_arch_word & 0x7;
if (version > 1) {
-   sum_mds_hdr_plus_mdtm_hdr_plus_len =
+   mds_and_mdtm_hdr_len =
(SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN +
 gl_mds_mcm_cb->node_name_len);
} else {
/* sending message to Old version Node  */
-   sum_mds_hdr_plus_mdtm_hdr_plus_len =
+   mds_and_mdtm_hdr_len =
(SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN - 1);
}
 
@@ -2598,13 +2598,13 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
/* This is exclusively for the Bcast ENC and ENC_FLAT case */
if (recv.msg.encoding == MDS_ENC_TYPE_FULL) {
ncs_dec_init_space(_uba,
-  recv.msg.data.fullenc_uba.start);
+   recv.msg.data.fullenc_uba.start);
recv.msg_arch_word = req->msg_arch_word;
} else if (recv.msg.encoding == MDS_ENC_TYPE_FLAT) {
/* This case will not arise, but just to be on safe side
 */
ncs_dec_init_space(_uba,
-  recv.msg.data.flat_uba.start);
+   recv.msg.data.flat_uba.start);
} else {
/* Do nothing for the DIrect buff and Copy case */
}
@@ -2620,19 +2620,18 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
uint32_t frag_seq_num = 0, node_status = 0;
 
node_status = m_MDS_CHECK_NCS_NODE_ID_RANGE(
-   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
+   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
 
if (NCSCC_RC_SUCCESS == node_status) {
tipc_id.node = m_MDS_GET_TIPC_NODE_ID_FROM_NCS_NODE_ID(
-   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
+   m_MDS_GET_NODE_ID_FROM_ADEST(req->adest));
tipc_id.ref = (uint32_t)(req->adest);
} else {
-   if (req->snd_type !=
-   MDS_SENDTYPE_ACK) { /* This check is becoz in ack
-  cases we are only sending the
-  hdr and no data part is being
-  send, so no message free ,
-  fix me */
+   if (req->snd_type != MDS_SENDTYPE_ACK) {
+   /* This check is becoz in ack cases we are only
+* sending the hdr and no data part is being
+*  send, so no message free. fix me
+*/
mdtm_free_reassem_msg_mem(>msg);
}
return NCSCC_RC_FAILURE;
@@ -2643,43 +2642,45 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
/* Only for the ack and not for any other message */
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
-   uint8_t len = sum_mds_hdr_plus_mdtm_hdr_plus_len;
+   uint8_t len = mds_and_mdtm_hdr_len;
uint8_t buffer_ack[len];
 
/* Add mds_hdr */
-   if (NCSCC_RC_SUCCESS !=
-   mdtm_add_mds_hdr(buffer_ack, req)) {
+   if (mdtm_add_mds_hdr(buffer_ack, req)
+   != NCSCC_RC_SUCCESS) {
+   return NCSCC_RC_FAILURE;
+   }
+   /* if sndqueue is capable, then obtain the current
+* sending seq
+*/
+   if (mds_tipc_fctrl_sndqueue_capable(tipc_id,
+   _seq_num) == NCSCC_RC_FAILURE){
+   m_MDS_LOG_ERR("FCTRL: Failed to send message"
+   " len :%d", len);
return NCSCC_RC_FAILURE;
}
-

[devel] [PATCH 0/1] Review Request for mds: Avoid message re-allocation [#3089]

2019-11-24 Thread Minh Chau

Summary: mds: Avoid message re-allocation [#3089]
Review request for Ticket(s): 3089
Peer Reviewer(s): Thuan, Gary, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3089
Base revision: c6c7e77292d622ee042476bb0815feae51dd0cba
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 18a3387b6331ec37ea47be47d1013afaee91d649
Author: Minh Chau 
Date:   Mon, 25 Nov 2019 17:06:20 +1100

mds: Avoid message re-allocation [#3089]

The patch avoids message reallocation if enable
MDS_TIPC_FCTRL_ENABLED



Complete diffstat:
--
 src/mds/mds_dt_tipc.c| 27 ---
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 +++--
 3 files changed, 24 insertions(+), 14 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Avoid message re-allocation [#3089]

2019-11-24 Thread Minh Chau

The patch avoids message reallocation if enable
MDS_TIPC_FCTRL_ENABLED
---
 src/mds/mds_dt_tipc.c| 27 ---
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 +++--
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fdf0da7..aa8d5c2 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -2644,7 +2644,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
uint8_t len = sum_mds_hdr_plus_mdtm_hdr_plus_len;
-   uint8_t buffer_ack[len];
+   uint8_t* buffer_ack = calloc(1, len);
 
/* Add mds_hdr */
if (NCSCC_RC_SUCCESS !=
@@ -2667,7 +2667,11 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
m_MDS_LOG_DBG(
"MDTM:Sending message with Service Seqno=%d, TO 
Dest_Tipc_id=<0x%08x:%u> ",
req->svc_seq_num, tipc_id.node, tipc_id.ref);
-   return mdtm_sendto(buffer_ack, len, tipc_id);
+   status = mdtm_sendto(buffer_ack, len, tipc_id);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(buffer_ack);
+   }
+   return status;
}
 
if (MDS_ENC_TYPE_FLAT == req->msg.encoding) {
@@ -2815,6 +2819,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
free(body);
return NCSCC_RC_FAILURE;
}
+   m_MMGR_FREE_BUFR_LIST(usrbuf);
+   free(body);
} else {
if (NCSCC_RC_SUCCESS !=
mdtm_sendto(body, len, tipc_id)) {
@@ -2824,9 +2830,12 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
free(body);
return NCSCC_RC_FAILURE;
}
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   m_MMGR_FREE_BUFR_LIST(usrbuf);
+   free(body);
+   }
}
-   m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
+
return NCSCC_RC_SUCCESS;
}
} break;
@@ -2909,7 +2918,9 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
mds_free_direct_buff(
req->msg.data.buff_info.buff);
}
-   free(body);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(body);
+   }
return NCSCC_RC_SUCCESS;
} break;
 
@@ -3059,21 +3070,23 @@ uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, 
uint32_t seq_num,
get_svc_names(req->src_svc_id), 
req->src_svc_id,
get_svc_names(req->dest_svc_id), 
req->dest_svc_id);
ret = mdtm_mcast_sendto(body, len_buf, req);
+   free(body);
} else {
m_MDS_LOG_DBG(
"MDTM:Sending message with Service 
Seqno=%d, Fragment Seqnum=%d, frag_num=%d, TO Dest_Tipc_id=<0x%08x:%u>",
req->svc_seq_num, seq_num, frag_val,
id.node, id.ref);
ret = mdtm_sendto(body, len_buf, id);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(body);
+   }
}
if (ret != NCSCC_RC_SUCCESS) {
// Failed to send a fragmented msg, stop sending
m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
break;
}
m_MMGR_REMOVE_FROM_START(, len_buf - hdr_plus);
-   free(body);
len = len - (len_buf - hdr_plus);
if (len == 0)
break;
diff --git a/src/mds/mds_tipc_fctrl_msg.cc b/src/mds/mds_tipc_fctrl_msg.cc
index

[devel] [PATCH 0/1] Review Request for Reduce mds logging [#3120]

2019-11-24 Thread Minh Chau

Summary: mds: Reduce mds logging [#3120]
Review request for Ticket(s): 3120
Peer Reviewer(s): Thuan, Vu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3120
Base revision: c6c7e77292d622ee042476bb0815feae51dd0cba
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision f39e30b489d6101d94d6b04bd4a4c0f5f1d5cbe6
Author: Minh Chau 
Date:   Mon, 25 Nov 2019 11:32:33 +1100

mds: Reduce mds logging [#3120]

The logging of broadcast/multicast is currently logged with
NOTIFY as mds does not support broadcast/multicast message,
so the logging would be helpful in some cases. However, the
mds.log may be located in nfs file system, and this logging
may cause high rate traffic towards nfs file system.

This patch moves the logging to DEBUG for broadcast/multicast
message, and for adding/removal mds service.



Complete diffstat:
--
 src/mds/mds_tipc_fctrl_intf.cc   | 4 ++--
 src/mds/mds_tipc_fctrl_portid.cc | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Reduce mds logging [#3120]

2019-11-24 Thread Minh Chau

The logging of broadcast/multicast is currently logged with
NOTIFY as mds does not support broadcast/multicast message,
so the logging would be helpful in some cases. However, the
mds.log may be located in nfs file system, and this logging
may cause high rate traffic towards nfs file system.

This patch moves the logging to DEBUG for broadcast/multicast
message, and for adding/removal mds service.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 4 ++--
 src/mds/mds_tipc_fctrl_portid.cc | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index dd8d80d..0e3230a 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -390,7 +390,7 @@ uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, 
uint32_t type) {
 id.node, id.ref, svc_id, portid->svc_cnt_);
   } else {
 portid->svc_cnt_++;
-m_MDS_LOG_NOTIFY("FCTRL: Add svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
+m_MDS_LOG_DBG("FCTRL: Add svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
 id.node, id.ref, svc_id, portid->svc_cnt_);
   }
 
@@ -410,7 +410,7 @@ uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, 
uint32_t type) {
   TipcPortId *portid = portid_lookup(id);
   if (portid != nullptr) {
 portid->svc_cnt_--;
-m_MDS_LOG_NOTIFY("FCTRL: Remove svc[node:%x, ref:%u svc_id:%u], 
svc_cnt:%u",
+m_MDS_LOG_DBG("FCTRL: Remove svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
 id.node, id.ref, svc_id, portid->svc_cnt_);
   }
   portid_map_mutex.unlock();
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 724eb7b..316e1ba 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -298,7 +298,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   }
 }
 if (rcving_mbcast_ == true) {
-  m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
+  m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
   "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
   "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
   "Ignore bcast/mcast ",
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/3] Review Request for mds: Fix backward compatibility of mds fragmentation message [#3111]

2019-11-08 Thread Minh Chau

Summary: mds: Distinguish protocol version of fragment [#3111]
Review request for Ticket(s): 3111
Peer Reviewer(s): Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3111
Base revision: ddb9d7065376df7757716013779755864d53ebe5
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 2cb2d135827d920155323a70a9587264e5c62ae2
Author: Minh Chau 
Date:   Fri, 8 Nov 2019 21:17:22 +1100

mds: Add backward compatibility mdstest for fragment [#3111]



revision 153b657d2873019160f31a3091fa660e4e469a9e
Author: Minh Chau 
Date:   Fri, 8 Nov 2019 21:08:18 +1100

mds: Refactor logging [#3111]

Since adding TipcPortId:ChangeState(), the patch refactors
logging to shorten the code.



revision 1ce0c74ca96fa028d02abe72932171e98c034342
Author: Minh Chau 
Date:   Fri, 8 Nov 2019 20:51:54 +1100

mds: Distinguish protocol version of fragment [#3111]

The legacy mds encodes the protocol version in either non fragment
message or the first fragment only. Hence, the subsequent fragment
after the first one is not able for mds to determine the protocol
version.

The patch maintains the encoding of lengthcheck as same as the legacy
mds version. Also, the subsequent fragments needs to consult the
stateful portid to determine the protocol version, so that the
fragment will be skipped if it is sent from legacy mds, or inspected
the sequence if it is sent from new mds.



Complete diffstat:
--
 src/mds/apitest/mdstipc_api.c|  83 +++--
 src/mds/mds_dt.h |   6 ++
 src/mds/mds_dt_tipc.c|  11 ++-
 src/mds/mds_tipc_fctrl_intf.cc   | 154 ++-
 src/mds/mds_tipc_fctrl_msg.cc|  86 +++---
 src/mds/mds_tipc_fctrl_msg.h |   5 ++
 src/mds/mds_tipc_fctrl_portid.cc |  94 +++-
 src/mds/mds_tipc_fctrl_portid.h  |   1 +
 8 files changed, 292 insertions(+), 148 deletions(-)


Testing Commands:
-
mdstest


Testing, Expected Results:
--
all tests pass

Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/

[devel] [PATCH 3/3] mds: Add backward compatibility mdstest for fragment [#3111]

2019-11-08 Thread Minh Chau

---
 src/mds/apitest/mdstipc_api.c | 83 ---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 5c0e28a..651365e 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13512,8 +13512,8 @@ void tet_mds_fctrl_compatibility_tp1(void)
uint32_t msg_num = 1000;
uint32_t msg_size = 500;
 
-   printf("\nTest Case 5: Sender enable MDS FCTRL but Receiver disable\n");
-   /**/
+   printf("\nTest Case 5: Sender enable MDS FCTRL, Receiver disable\n");
+   /*-*/
pid_t pid = fork();
if (pid == 0) {
/* child as sender */
@@ -13545,8 +13545,8 @@ void tet_mds_fctrl_compatibility_tp2(void)
uint32_t msg_num = 1000;
uint32_t msg_size = 500;
 
-   printf("\nTest Case 5: Sender diable MDS FCTRL but Receiver enable\n");
-   /**/
+   printf("\nTest Case 6: Sender disable MDS FCTRL, Receiver enable\n");
+   /*-*/
pid_t pid = fork();
if (pid == 0) {
/* child as sender */
@@ -13644,6 +13644,73 @@ void tet_mds_fctrl_with_sna_tp2(void)
test_validate(FAIL, 0);
 }
 
+
+void tet_mds_fctrl_compatibility_tp3(void)
+{
+   int FAIL = 1;
+   uint32_t msg_num = 5;
+   uint32_t msg_size = 13;
+
+   printf("\nTest Case 9: Sender enable MDS FCTRL, Receiver disable\n");
+   /*-*/
+   pid_t pid = fork();
+   if (pid == 0) {
+   /* child as sender */
+   setenv("MDS_TIPC_FCTRL_ENABLED", "1", 1);
+   mds_startup();
+   MDS_SVC_ID to_svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_INTERNAL_MIN;
+   tet_sender(svc_id, msg_num, msg_size, 1, to_svcids);
+   mds_shutdown();
+   } else if (pid > 0) {
+   /* parent as receiver */
+   mds_startup();
+   MDS_SVC_ID fr_svcids[] = {NCSMDS_SVC_ID_INTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_EXTERNAL_MIN;
+   FAIL = tet_receiver(svc_id, msg_num, msg_size, 1, fr_svcids);
+   printf("\nReceiver finish, kill Sender\n");
+   kill(pid, SIGKILL);
+   mds_shutdown();
+   } else {
+   printf("\nFAIL to fork()\n");
+   }
+
+   test_validate(FAIL, 0);
+}
+
+void tet_mds_fctrl_compatibility_tp4(void)
+{
+   int FAIL = 1;
+   uint32_t msg_num = 10;
+   uint32_t msg_size = 13;
+
+   printf("\nTest Case 10: Sender disable MDS FCTRL, Receiver enable\n");
+   /*--*/
+   pid_t pid = fork();
+   if (pid == 0) {
+   /* child as sender */
+   mds_startup();
+   MDS_SVC_ID to_svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_INTERNAL_MIN;
+   tet_sender(svc_id, msg_num, msg_size, 1, to_svcids);
+   mds_shutdown();
+   } else if (pid > 0) {
+   /* parent as receiver */
+   setenv("MDS_TIPC_FCTRL_ENABLED", "1", 1);
+   mds_startup();
+   MDS_SVC_ID fr_svcids[] = {NCSMDS_SVC_ID_INTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_EXTERNAL_MIN;
+   FAIL = tet_receiver(svc_id, msg_num, msg_size, 1, fr_svcids);
+   printf("\nReceiver finish, kill Sender\n");
+   kill(pid, SIGKILL);
+   mds_shutdown();
+   } else {
+   printf("\nFAIL to fork()\n");
+   }
+   test_validate(FAIL, 0);
+}
+
+
 void Print_return_status(uint32_t rs)
 {
switch (rs) {
@@ -14384,7 +14451,7 @@ __attribute__((constructor)) static void 
mdsTipcAPI_constructor(void)
"Sender enable MDS FCTRL but Receiver disable");
test_case_add(
27, tet_mds_fctrl_compatibility_tp2,
-   "Sender diable MDS FCTRL but Receiver enable");
+   "Sender disable MDS FCTRL but Receiver enable");
test_case_add(
27, tet_mds_fctrl_with_sna_tp1,
"Sender gradually sends more than 65535"
@@ -14395,4 +14462,10 @@ __attribute__((constructor)) static void 
mdsTipcAPI_constructor(void)
"Sender gradually sends more than 65535"
" big messages (OVERLOAD happens)"
" and receiver should receive them all");
+   test_case_add(
+   27, tet_mds_fctrl_compatibility_tp3,
+   "Sender enable MDS FCTRL but Receiver disable");
+

[devel] [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

2019-11-08 Thread Minh Chau

The legacy mds encodes the protocol version in either non fragment
message or the first fragment only. Hence, the subsequent fragment
after the first one is not able for mds to determine the protocol
version.

The patch maintains the encoding of lengthcheck as same as the legacy
mds version. Also, the subsequent fragments needs to consult the
stateful portid to determine the protocol version, so that the
fragment will be skipped if it is sent from legacy mds, or inspected
the sequence if it is sent from new mds.
---
 src/mds/mds_dt.h |   6 ++
 src/mds/mds_dt_tipc.c|  11 ++-
 src/mds/mds_tipc_fctrl_intf.cc   | 154 ++-
 src/mds/mds_tipc_fctrl_msg.cc|  86 +++---
 src/mds/mds_tipc_fctrl_msg.h |   5 ++
 src/mds/mds_tipc_fctrl_portid.cc |  23 ++
 src/mds/mds_tipc_fctrl_portid.h  |   1 +
 7 files changed, 193 insertions(+), 93 deletions(-)

diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index 64da600..007ff98 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -243,6 +243,12 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 #define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* Unknown or undefined MDS protocol/version */
+#define MDS_PROT_UNDEFINED 0x00
+
+/* MDS protocol/version for non flow control (legacy) */
+#define MDS_PROT_LEGACY (MDS_PROT | MDS_VERSION)
+
 /* MDS protocol/version for flow control */
 #define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
 #define MDS_PROT_FCTRL_ID 0xFDAC13F5
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e085de7..fdf0da7 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -166,7 +166,7 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
-static uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+static uint8_t gl_mds_pro_ver = MDS_PROT_LEGACY;
 static int gl_mds_fctrl_acksize = -1;
 static int gl_mds_fctrl_ackto = -1;
 
@@ -381,7 +381,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
"MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
}
} else {
-   gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+   gl_mds_pro_ver = MDS_PROT_LEGACY;
syslog(LOG_ERR, "MDTM:TIPC Invalid value of"
"MDS_TIPC_FCTRL_ENABLED");
}
@@ -3125,7 +3125,12 @@ uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t 
len, uint32_t seq_num,
 * hereafter, these 2 bytes will be used as sequence number in flow 
control
 * (per tipc portid)
 * */
-   ncs_encode_16bit(, fctrl_seq_num);
+   if (gl_mds_pro_ver == MDS_PROT_FCTRL) {
+   ncs_encode_16bit(, fctrl_seq_num);
+   } else {
+   ncs_encode_16bit(, len - MDTM_FRAG_HDR_LEN - 2);
+   }
+
 #endif
return NCSCC_RC_SUCCESS;
 }
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index c9073b2..3d92290 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -132,8 +132,16 @@ uint32_t process_flow_event(const Event& evt) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   chunk_ack_size, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
-  rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  if (evt.legacy_data_ == true) {
+// we create portid and set state kDisabled even though we know
+// this portid has no flow control. It is because the 2nd, 3rd fragment
+// could not reflect the protocol version, so need to keep this portid
+// remained stateful
+portid->ChangeState(TipcPortId::State::kDisabled);
+  } else {
+rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
+  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  }
 } else if (evt.type_ == Event::Type::kEvtRcvIntro) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   chunk_ack_size, sock_buf_size);
@@ -146,8 +154,12 @@ uint32_t process_flow_event(const Event& evt) {
 }
   } else {
 if (evt.type_ == Event::Type::kEvtRcvData) {
-  rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  if (evt.legacy_data_ == true) {
+portid->ChangeState(TipcPortId::State::kDisabled);
+  } else {
+rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
+evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  }
 }
 if (evt.type_ == Event::Type::kEvtRcvChunkAck) {
   portid->ReceiveChunkAck(evt.fseq_, evt.chunk_size_);
@@ -474,76 +486,88 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, 
uint16_t len,
 struct tipc_portid id) {
   if (is_fctrl_enabled ==

[devel] [PATCH 2/3] mds: Refactor logging [#3111]

2019-11-08 Thread Minh Chau

Since adding TipcPortId:ChangeState(), the patch refactors
logging to shorten the code.
---
 src/mds/mds_tipc_fctrl_portid.cc | 71 
 1 file changed, 21 insertions(+), 50 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 9b87c74..df53d4d 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -208,17 +208,13 @@ bool TipcPortId::ReceiveCapable(uint16_t sending_len) {
 if (state_ == State::kTxProb) {
   // Too many msgs are not acked by receiver while in txprob state
   // disable flow control
-  state_ = State::kDisabled;
-  m_MDS_LOG_ERR("FCTRL: me --> [node:%x, ref:%u], [nacked:%" PRIu64
-  ", len:%u, rcv_buf_size:%" PRIu64 "], Warning[kTxProb -> kDisabled]",
-  id_.node, id_.ref, sndwnd_.nacked_space_,
-  sending_len, rcv_buf_size_);
+  m_MDS_LOG_ERR("FCTRL: me --> [node:%x, ref:%u], "
+  "Warning[Too many nacked in kTxProb]",
+  id_.node, id_.ref);
+  ChangeState(State::kDisabled);
   return true;
 } else if (state_ == State::kEnabled) {
-  state_ = State::kRcvBuffOverflow;
-  m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow, %" PRIu64
-  ", %u, %" PRIu64, id_.node, id_.ref, sndwnd_.nacked_space_,
-  sending_len, rcv_buf_size_);
+  ChangeState(State::kRcvBuffOverflow);
 }
 return false;
   }
@@ -271,20 +267,18 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   uint32_t rc = NCSCC_RC_SUCCESS;
   if (state_ == State::kDisabled) {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvData, TxProb[retries:%u, state:%u], "
-"Error[receive fseq:%u in invalid state]",
+"RcvData[mseq:%u, mfrag:%u, fseq:%u], "
+"rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
+"Warning[Invalid state:%u]",
 id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_,
-fseq);
+mseq, mfrag, fseq,
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_,
+(uint8_t)state_);
 return rc;
   }
   // update state
   if (state_ == State::kTxProb || state_ == State::kStartup) {
-state_ = State::kEnabled;
-m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvData, TxProb[retries:%u, state:%u]",
-id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_);
+ChangeState(State::kEnabled);
   }
   // if tipc multicast is enabled, receiver does not inspect sequence number
   // for both fragment/unfragment multicast/broadcast message
@@ -398,12 +392,7 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
   }
   // update state
   if (state_ == State::kTxProb) {
-state_ = State::kEnabled;
-m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvChkAck, "
-"TxProb[retries:%u, state:%u]",
-id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_);
+ChangeState(State::kEnabled);
   }
   // update sender sequence window
   if (sndwnd_.acked_ < Seq16(fseq)) {
@@ -474,9 +463,7 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
 }
 // no more unsent message, back to kEnabled
 if (msg == nullptr && state_ == State::kRcvBuffOverflow) {
-  state_ = State::kEnabled;
-  m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] Overflow --> Enabled ",
-  id_.node, id_.ref);
+  ChangeState(State::kEnabled);
 }
   } else {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
@@ -517,9 +504,7 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
 }
   }
   if (state_ != State::kRcvBuffOverflow) {
-state_ = State::kRcvBuffOverflow;
-m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
-id_.node, id_.ref);
+ChangeState(State::kRcvBuffOverflow);
 sndqueue_.MarkUnsentFrom(Seq16(fseq));
   }
   DataMessage* msg = sndqueue_.Find(Seq16(fseq));
@@ -545,27 +530,15 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
mfrag,
 
 bool TipcPortId::ReceiveTmrTxProb(uint8_t max_txprob) {
   bool restart_txprob = false;
-  if (state_ == State::kDisabled ||
-  sndwnd_.acked_ > Seq16(1) ||
-  rcvwnd_.rcv_ > Seq16(1)) return restart_txprob;
+  if (state_ == State::kDisabled) return restart_txprob;
   if (state_ == State::kTxProb || state_ == State::kRcvBuffOverflow) {
 txprob_cnt_++;
 if (txprob_cnt_ >= max_txprob) {
-  state_ = State::kDisabled;
+  ChangeState(State::kDisabled);
   restart_txprob = false;
 } else {
   restart_txprob = true;
 }
-
-// at kDisabled state, clear all message in sndqueue_,
-// receiver is at old mds version
-if (state_ == State::kDisabled) {
-  FlushData();
-  m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u], "
-  "TxProbExp, TxProb[retries:%u, state:%u]",
-  id_.node, id_.ref,
-  txprob_cnt_, (uint8_t)state_);
-}
   }
   return

[devel] [PATCH 1/1] mds: Unset flow control env var [#3109]

2019-10-30 Thread Minh Chau

Patch unsets MDS_TIPC_FCTRL_ENABLED, MDS_TIPC_FCTRL_ACKTIMEOUT,
and MDS_TIPC_FCTRL_ACKSIZE to prevent child process inheritance.
---
 src/mds/mds_dt_tipc.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e7a7b48..096e4ca 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -167,6 +167,8 @@ uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
 uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+int gl_mds_fctrl_acksize = -1;
+int gl_mds_fctrl_ackto = -1;
 
 static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
struct sockaddr_tipc addr;
@@ -347,32 +349,49 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
if ((ptr = getenv("MDS_TIPC_FCTRL_ENABLED")) != NULL) {
if (atoi(ptr) == 1) {
gl_mds_pro_ver = MDS_PROT_FCTRL;
-   int ackto = -1;
-   int acksize = -1;
if ((ptr = getenv("MDS_TIPC_FCTRL_ACKTIMEOUT")) != 
NULL) {
-   ackto = atoi(ptr);
-   if (ackto == 0) {
+   gl_mds_fctrl_ackto = atoi(ptr);
+   if (gl_mds_fctrl_ackto == 0) {
syslog(LOG_ERR, "MDTM:TIPC Invalid "

"MDS_TIPC_FCTRL_ACKTIMEOUT, using default value");
-   ackto = -1;
+   gl_mds_fctrl_ackto = -1;
}
}
if ((ptr = getenv("MDS_TIPC_FCTRL_ACKSIZE")) != NULL) {
-   acksize = atoi(ptr);
-   if (acksize == 0) {
+   gl_mds_fctrl_acksize = atoi(ptr);
+   if (gl_mds_fctrl_acksize == 0) {
syslog(LOG_ERR, "MDTM:TIPC Invalid "

"MDS_TIPC_FCTRL_ACKSIZE, using default value");
-   acksize = -1;
+   gl_mds_fctrl_acksize = -1;
}
}
-   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id, 
optval,
-   ackto, acksize, tipc_mcast_enabled);
+   /* unset the env var to prevent child process 
inheritance */
+   if (unsetenv("MDS_TIPC_FCTRL_ENABLED") != 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ENABLED");
+   }
+   if (gl_mds_fctrl_ackto != -1 &&
+   unsetenv("MDS_TIPC_FCTRL_ACKTIMEOUT") != 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKTIMEOUT");
+   }
+   if (gl_mds_fctrl_acksize != -1 &&
+   unsetenv("MDS_TIPC_FCTRL_ACKSIZE") != 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
+   }
} else {
+   gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
syslog(LOG_ERR, "MDTM:TIPC Invalid value of"
"MDS_TIPC_FCTRL_ENABLED");
}
}
 
+   if (gl_mds_pro_ver == MDS_PROT_FCTRL) {
+   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id, optval,
+   gl_mds_fctrl_ackto, gl_mds_fctrl_acksize, 
tipc_mcast_enabled);
+   }
+
/* Create a task to receive the events and data */
if (mdtm_create_rcv_task(tipc_cb.hdle_mdtm) != NCSCC_RC_SUCCESS) {
syslog(LOG_ERR,
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mds: Unset flow control env var [#3109] V2

2019-10-30 Thread Minh Chau

Summary: mds: Unset flow control env var [#3109]
Review request for Ticket(s): 3109
Peer Reviewer(s): Hans, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3109
Base revision: e685bdfb16dad852372751f80aa2ec49948db05c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision e56bbd6daca48a5c6400682ab9ae46786897e9e7
Author: Minh Chau 
Date:   Thu, 31 Oct 2019 15:48:05 +1100

mds: Unset flow control env var [#3109]

Patch unsets MDS_TIPC_FCTRL_ENABLED, MDS_TIPC_FCTRL_ACKTIMEOUT,
and MDS_TIPC_FCTRL_ACKSIZE to prevent child process inheritance.



Complete diffstat:
--
 src/mds/mds_dt_tipc.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Unset flow control env var [#3109]

2019-10-30 Thread Minh Chau

Patch unsets MDS_TIPC_FCTRL_ENABLED, MDS_TIPC_FCTRL_ACKTIMEOUT,
and MDS_TIPC_FCTRL_ACKSIZE to prevent child process inheritance.
---
 src/mds/mds_dt_tipc.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e7a7b48..12b275d 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -367,6 +367,19 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id, 
optval,
ackto, acksize, tipc_mcast_enabled);
+   /* unset the env var to prevent child process 
inheritance */
+   if (unsetenv("MDS_TIPC_FCTRL_ENABLED") != 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ENABLED");
+   }
+   if (ackto != -1 && 
unsetenv("MDS_TIPC_FCTRL_ACKTIMEOUT") != 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKTIMEOUT");
+   }
+   if (acksize != -1 && unsetenv("MDS_TIPC_FCTRL_ACKSIZE") 
!= 0) {
+   syslog(LOG_ERR,
+   "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
+   }
} else {
syslog(LOG_ERR, "MDTM:TIPC Invalid value of"
"MDS_TIPC_FCTRL_ENABLED");
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mds: Unset flow control env var [#3109]

2019-10-30 Thread Minh Chau

Summary: mds: Unset flow control env var [#3109]
Review request for Ticket(s): 3109
Peer Reviewer(s): Hans, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3109
Base revision: e685bdfb16dad852372751f80aa2ec49948db05c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 4c27ad9238c68f4d55be4c1dd81dc59775f9253b
Author: Minh Chau 
Date:   Thu, 31 Oct 2019 09:21:16 +1100

mds: Unset flow control env var [#3109]

Patch unsets MDS_TIPC_FCTRL_ENABLED, MDS_TIPC_FCTRL_ACKTIMEOUT,
and MDS_TIPC_FCTRL_ACKSIZE to prevent child process inheritance.



Complete diffstat:
--
 src/mds/mds_dt_tipc.c | 13 +
 1 file changed, 13 insertions(+)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Disable mds flow control for mds broadcast/multicast message [#3101]

2019-10-16 Thread Minh Chau

The mds flow control has been disabled for broadcast/mulitcast unfragment
message if tipc multicast is enabled. This patch revisits and continues
with fragment messages.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 47 
 src/mds/mds_tipc_fctrl_msg.h | 11 +++---
 src/mds/mds_tipc_fctrl_portid.cc | 47 ++--
 src/mds/mds_tipc_fctrl_portid.h  |  3 ++-
 4 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index b803bfe..fe3dbd5 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -133,7 +133,7 @@ uint32_t process_flow_event(const Event& evt) {
   kChunkAckSize, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_);
+evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
 } else if (evt.type_ == Event::Type::kEvtRcvIntro) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   kChunkAckSize, sock_buf_size);
@@ -147,7 +147,7 @@ uint32_t process_flow_event(const Event& evt) {
   } else {
 if (evt.type_ == Event::Type::kEvtRcvData) {
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-  evt.fseq_, evt.svc_id_);
+  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
 }
 if (evt.type_ == Event::Type::kEvtRcvChunkAck) {
   portid->ReceiveChunkAck(evt.fseq_, evt.chunk_size_);
@@ -430,6 +430,7 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
 
   HeaderMessage header;
   header.Decode(buffer);
+  Event* pevt = nullptr;
   // if mds support flow control
   if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
 if (header.pro_id_ == MDS_PROT_FCTRL_ID) {
@@ -438,9 +439,10 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, 
uint16_t len,
 ChunkAck ack;
 ack.Decode(buffer);
 // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_, ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
   strerror(errno));
@@ -453,9 +455,9 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
   DataMessage data;
   data.Decode(buffer);
   // send to the event thread
-  if (m_NCS_IPC_SEND(_events,
-  new Event(Event::Type::kEvtDropData, id, data.svc_id_,
-  header.mseq_, header.mfrag_, header.fseq_),
+  pevt = new Event(Event::Type::kEvtDropData, id, data.svc_id_,
+  header.mseq_, header.mfrag_, header.fseq_);
+  if (m_NCS_IPC_SEND(_events, pevt,
   NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
 strerror(errno));
@@ -474,6 +476,7 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
 
   HeaderMessage header;
   header.Decode(buffer);
+  Event* pevt = nullptr;
   // if mds support flow control
   if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
 if (header.pro_id_ == MDS_PROT_FCTRL_ID) {
@@ -482,9 +485,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
 ChunkAck ack;
 ack.Decode(buffer);
 // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_, ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
   strerror(errno));
@@ -494,9 +498,9 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
 Nack nack;
 nack.Decode(buffer);
 // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
-header.mseq_, header.mfrag_, nack.nacked_fseq_),
+pevt = new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
+header.mseq_, header.mfrag_, nack.nacked_fseq_);
+if (m_NCS_IPC_SEND(_events, pevt,
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
   m_MDS_LOG_ERR("FCTRL: Failed to send

[devel] [PATCH 0/1] Review Request for mds: Disable mds flow control for mds broadcast/multicast message [#3101]

2019-10-16 Thread Minh Chau

Summary: mds: Disable mds flow control for mds broadcast/multicast message 
[#3101]
Review request for Ticket(s): 3101
Peer Reviewer(s): Hans, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3101
Base revision: 95228b1a2a53e3b74c9a54f65e8b2345b8603582
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision f07139552fa194903a29991fe9d9b21d0c1a8de0
Author: Minh Chau 
Date:   Thu, 17 Oct 2019 13:48:31 +1100

mds: Disable mds flow control for mds broadcast/multicast message [#3101]

The mds flow control has been disabled for broadcast/mulitcast unfragment
message if tipc multicast is enabled. This patch revisits and continues
with fragment messages.



Complete diffstat:
--
 src/mds/mds_tipc_fctrl_intf.cc   | 47 
 src/mds/mds_tipc_fctrl_msg.h | 11 +++---
 src/mds/mds_tipc_fctrl_portid.cc | 47 ++--
 src/mds/mds_tipc_fctrl_portid.h  |  3 ++-
 4 files changed, 69 insertions(+), 39 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mds: Do not check upper limit of window size [#3100]

2019-10-16 Thread Minh Chau

Summary: mds: Do not check upper limit of window size [#3100]
Review request for Ticket(s): 3100
Peer Reviewer(s): Hans, Gary, Thuan, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3100
Base revision: 95228b1a2a53e3b74c9a54f65e8b2345b8603582
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision e8c4d42438a41bb86f5d56dda5b91a82de7cc1b9
Author: Minh Chau 
Date:   Wed, 16 Oct 2019 22:18:52 +1100

mds: Do not check upper limit of window size [#3100]

According to RFC1982: "Addition of a value outside the range
[0 .. (2^(SERIAL_BITS - 1) - 1)] is undefined.". Mds uses 16
bits for mds flow control, thus the maximum allowed range of
window size is 2^15 - 1 = 32767.
The 'mdstest 27 8' has randomly hit this limitation with the
counter errors that is detected in mds as belog logging:

FCTRL: [me] <-- [node:1001001, ref:2784751213],
RcvChkAck[fseq:31067, chunk:3], sndwnd[acked:31064,
send:63850, nacked:1901634], queue[size:32785],
Error[msg disordered]

The fseq should always be less then sndwnd_.send_, hence mds
should check the sender being capable of sending more message
only if D = sndwnd_.send_ - sndwnd_.acked_ < 2^15 - 1 = 32767
If a burst of message is sent, D could be > 32767, mds in this
case should notify the sender try to send again later; which
however could leads to a backward compatibility. For now mds
weaken the windown size verification, only logs a warning and
let the transmission continue.



Complete diffstat:
--
 src/mds/mds_tipc_fctrl_portid.cc | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present a

[devel] [PATCH 1/1] mds: Do not check upper limit of window size [#3100]

2019-10-16 Thread Minh Chau

According to RFC1982: "Addition of a value outside the range
[0 .. (2^(SERIAL_BITS - 1) - 1)] is undefined.". Mds uses 16
bits for mds flow control, thus the maximum allowed range of
window size is 2^15 - 1 = 32767.
The 'mdstest 27 8' has randomly hit this limitation with the
counter errors that is detected in mds as belog logging:

FCTRL: [me] <-- [node:1001001, ref:2784751213],
RcvChkAck[fseq:31067, chunk:3], sndwnd[acked:31064,
send:63850, nacked:1901634], queue[size:32785],
Error[msg disordered]

The fseq should always be less then sndwnd_.send_, hence mds
should check the sender being capable of sending more message
only if D = sndwnd_.send_ - sndwnd_.acked_ < 2^15 - 1 = 32767
If a burst of message is sent, D could be > 32767, mds in this
case should notify the sender try to send again later; which
however could leads to a backward compatibility. For now mds
weaken the windown size verification, only logs a warning and
let the transmission continue.
---
 src/mds/mds_tipc_fctrl_portid.cc | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index a9fa7d3..6eae7d4 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -378,7 +378,28 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
 txprob_cnt_, (uint8_t)state_);
   }
   // update sender sequence window
-  if (sndwnd_.acked_ < Seq16(fseq) && Seq16(fseq) < sndwnd_.send_) {
+  if (sndwnd_.acked_ < Seq16(fseq)) {
+// The fseq_ should always be less then sndwnd_.send_, hence
+// mds should check the sender being capable of sending more
+// message only if D = sndwnd_.send_ - sndwnd_.acked_ < 2^15 - 1 = 32767
+// If a burst of message is sent, D could be > 32767
+// mds in this case should notify the sender try to send again
+// later; which however could leads to a backward compatibility
+// For now mds logs a warning and let the transmission continue
+// (mds could be changed to return try again if it is not a backward
+// compatibility problem to a specific client.
+if (Seq16(fseq) >= sndwnd_.send_) {
+  m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
+  "RcvChkAck[fseq:%u, chunk:%u], "
+  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
+  "queue[size:%" PRIu64 "], "
+  "Warning[ack sequence out of window]",
+  id_.node, id_.ref,
+  fseq, chksize,
+  sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
+  sndqueue_.Size());
+}
+
 m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvChkAck[fseq:%u, chunk:%u], "
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Add Intro message [#3090]

2019-10-14 Thread Minh Chau

mds relies on data message sent from the peer to determine
whether the MDS_TIPC_FCTRL_ENABLED is set. The data message
may not be sent right after TIPC_PUBLISHED event, which can
cause the tx probation timer timeout.

This patch add Intro message, which is sent right after the
TIPC_PUBLISHED to help mds determine the flow control supported
at the peer earlier.
---
 src/mds/mds_main.c   |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
 src/mds/mds_tipc_fctrl_msg.cc| 11 +
 src/mds/mds_tipc_fctrl_msg.h | 18 +++
 src/mds/mds_tipc_fctrl_portid.cc | 49 ++--
 src/mds/mds_tipc_fctrl_portid.h  |  2 ++
 6 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/src/mds/mds_main.c b/src/mds/mds_main.c
index 8c9b1f1..c7d2f7b 100644
--- a/src/mds/mds_main.c
+++ b/src/mds/mds_main.c
@@ -408,7 +408,7 @@ uint32_t mds_lib_req(NCS_LIB_REQ_INFO *req)
if (tipc_mcast_enabled != false)
tipc_mcast_enabled = true;
 
-   m_MDS_LOG_DBG(
+   m_MDS_LOG_NOTIFY(
"MDS: TIPC_MCAST_ENABLED: %d  Set argument 
\n",
tipc_mcast_enabled);
}
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 6271890..b803bfe 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -39,6 +39,7 @@ using mds::DataMessage;
 using mds::ChunkAck;
 using mds::HeaderMessage;
 using mds::Nack;
+using mds::Intro;
 
 namespace {
 // flow control enabled/disabled
@@ -124,12 +125,20 @@ uint32_t process_flow_event(const Event& evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   TipcPortId *portid = portid_lookup(evt.id_);
   if (portid == nullptr) {
+// the null portid normally should not happen; however because the
+// tipc_cb.Dsock and tipc_cb.BSRsock are separated; the data message
+// sent from BSRsock may come before reception of TIPC_PUBLISHED
 if (evt.type_ == Event::Type::kEvtRcvData) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   kChunkAckSize, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
 evt.fseq_, evt.svc_id_);
+} else if (evt.type_ == Event::Type::kEvtRcvIntro) {
+  portid = new TipcPortId(evt.id_, data_sock_fd,
+  kChunkAckSize, sock_buf_size);
+  portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
+  portid->ReceiveIntro();
 } else {
   m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
   "RcvEvt[evt:%d], Error[PortId not found]",
@@ -151,6 +160,9 @@ uint32_t process_flow_event(const Event& evt) {
   portid->ReceiveNack(evt.mseq_, evt.mfrag_,
   evt.fseq_);
 }
+if (evt.type_ == Event::Type::kEvtRcvIntro) {
+  portid->ReceiveIntro();
+}
   }
   return rc;
 }
@@ -489,6 +501,16 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
   strerror(errno));
 }
+  } else if (header.msg_type_ == Intro::kIntroMsgType) {
+// no need to decode intro message
+// the decoding intro message type is done in header decoding
+// send to the event thread
+if (m_NCS_IPC_SEND(_events,
+new Event(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
+  strerror(errno));
+}
   } else {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
 "[msg_type:%u], Error[not supported message type]",
@@ -516,6 +538,11 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
   portid_map_mutex.unlock();
   return rc;
 }
+  } else {
+m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
+"Receive non-flow-control data message, "
+"header.pro_ver:%u",
+id.node, id.ref, header.pro_ver_);
   }
   return NCSCC_RC_SUCCESS;
 }
diff --git a/src/mds/mds_tipc_fctrl_msg.cc b/src/mds/mds_tipc_fctrl_msg.cc
index 932120f..180dcb6 100644
--- a/src/mds/mds_tipc_fctrl_msg.cc
+++ b/src/mds/mds_tipc_fctrl_msg.cc
@@ -178,4 +178,15 @@ void Nack::Decode(uint8_t *msg) {
   nacked_fseq_ = ncs_decode_16bit();
 }
 
+
+void Intro::Encode(uint8_t *msg) {
+  uint8_t *ptr;
+  // encode protocol identifier
+  ptr = [Intro::FieldIndex::kProtocolIdentifier];
+  ncs_encode_32bit(, MDS_PROT_FCTRL_ID);
+  // encode message type
+  ptr = [Intro::FieldIndex::kFlowControlMessageType];
+  ncs_encode_8bit(, kIntroMsgType);
+}
+
 }  // end namespace mds
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index e1db200..3e45fa6 100644
---

[devel] [PATCH 0/1] Review Request for mds: Add Intro message [#3090]

2019-10-14 Thread Minh Chau

Summary: mds: Add Intro message [#3090]
Review request for Ticket(s): 3090
Peer Reviewer(s): Hans, Vu, Gary, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3090
Base revision: 413b8fa37a190ffe1e34ea09205bc22b8d8e60a4
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision a97fbc32f2d4402f261fa3e88a05595f89857443
Author: Minh Chau 
Date:   Tue, 15 Oct 2019 12:41:07 +1100

mds: Add Intro message [#3090]

mds relies on data message sent from the peer to determine
whether the MDS_TIPC_FCTRL_ENABLED is set. The data message
may not be sent right after TIPC_PUBLISHED event, which can
cause the tx probation timer timeout.

This patch add Intro message, which is sent right after the
TIPC_PUBLISHED to help mds determine the flow control supported
at the peer earlier.



Complete diffstat:
--
 src/mds/mds_main.c   |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
 src/mds/mds_tipc_fctrl_msg.cc| 11 +
 src/mds/mds_tipc_fctrl_msg.h | 18 +++
 src/mds/mds_tipc_fctrl_portid.cc | 49 ++--
 src/mds/mds_tipc_fctrl_portid.h  |  2 ++
 6 files changed, 96 insertions(+), 13 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mds: Add Reset message [#3090]

2019-10-10 Thread Minh Chau

Summary: mds: Add Reset message [#3090]
Review request for Ticket(s): 3090
Peer Reviewer(s): Hans, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3090
Base revision: e4c3c0c95644238fc84f31352e8ef289d9820ab4
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 4ef0125e47b71ea3da2fd653ecfdfc50e638b800
Author: Minh Chau 
Date:   Fri, 11 Oct 2019 14:42:04 +1100

mds: Add Reset message [#3090]

mds relies on data message sent from the peer to determine
whether the MDS_TIPC_FCTRL_ENABLED is set. The data message
may not be sent right after TIPC_PUBLISHED event, which can
cause the tx probation timer timeout.

This patch add Reset message, which is sent right after the
TIPC_PUBLISHED to help mds determine the flow control supported
at the peer earlier.



Complete diffstat:
--
 src/mds/mds_main.c   |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
 src/mds/mds_tipc_fctrl_msg.cc| 11 +
 src/mds/mds_tipc_fctrl_msg.h | 18 +++
 src/mds/mds_tipc_fctrl_portid.cc | 49 ++--
 src/mds/mds_tipc_fctrl_portid.h  |  2 ++
 6 files changed, 96 insertions(+), 13 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Enhance decoding for mds flow control message [#3097]

2019-10-06 Thread Minh Chau

mds currently uses MDS_PROT_FCTRL_ID 4 bytes value (0x00AC13F5)
from octet11 to octet14 to identify the flow control message
e.g., chunkack message. In case of fragmentation from big
message, the second fragment onwards will start from the octet11,
which may have arbitrary value and cause mds to incorrectly
decode as a flow control message if the fragment starts with
value of 0x00AC13F5.

This patch fixes this rare case by decoding flow control message
only if the oct2-5 (mds global sequence number) and oct6-7 (mds
fragment number) are non-zero. Change MDS_PROT_FCTRL_ID:0xFDAC13F5
---
 src/mds/mds_dt.h  |  2 +-
 src/mds/mds_tipc_fctrl_msg.cc | 20 +---
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index d9e8633..64da600 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -245,7 +245,7 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 
 /* MDS protocol/version for flow control */
 #define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
-#define MDS_PROT_FCTRL_ID 0x00AC13F5
+#define MDS_PROT_FCTRL_ID 0xFDAC13F5
 
 /* Added for the subscription changes */
 #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
diff --git a/src/mds/mds_tipc_fctrl_msg.cc b/src/mds/mds_tipc_fctrl_msg.cc
index 064d977..8375673 100644
--- a/src/mds/mds_tipc_fctrl_msg.cc
+++ b/src/mds/mds_tipc_fctrl_msg.cc
@@ -64,13 +64,19 @@ void HeaderMessage::Decode(uint8_t *msg) {
 // decode flow control sequence number
 ptr = [HeaderMessage::FieldIndex::kFlowControlSequenceNumber];
 fseq_ = ncs_decode_16bit();
-// decode protocol identifier
-ptr = [ChunkAck::FieldIndex::kProtocolIdentifier];
-pro_id_ = ncs_decode_32bit();
-if (pro_id_ == MDS_PROT_FCTRL_ID) {
-  // decode message type
-  ptr = [ChunkAck::FieldIndex::kFlowControlMessageType];
-  msg_type_ = ncs_decode_8bit();
+// decode protocol identifier if the mfrag_ and mseq_ are 0
+// otherwise it is always DataMessage within non-zero mseq_ and mfrag_
+if (mfrag_ == 0 && mseq_ == 0) {
+  ptr = [ChunkAck::FieldIndex::kProtocolIdentifier];
+  pro_id_ = ncs_decode_32bit();
+  if (pro_id_ == MDS_PROT_FCTRL_ID) {
+// decode message type
+ptr = [ChunkAck::FieldIndex::kFlowControlMessageType];
+msg_type_ = ncs_decode_8bit();
+  }
+} else {
+  pro_id_ = 0;
+  msg_type_ = 0;
 }
   } else {
 if (mfrag_ != 0) {
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mds: Enhance decoding for mds flow control message [#3097]

2019-10-06 Thread Minh Chau

Summary: mds: Enhance decoding for mds flow control message [#3097]
Review request for Ticket(s): 3097
Peer Reviewer(s): Hans, Vu, Gary, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3097
Base revision: e699c22ddc1ca8530318b0dc0bde46794a224bd9
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision b3a4d9597a6738bcad7a65be2a050382adb5b9ff
Author: Minh Chau 
Date:   Mon, 7 Oct 2019 11:04:35 +1100

mds: Enhance decoding for mds flow control message [#3097]

mds currently uses MDS_PROT_FCTRL_ID 4 bytes value (0x00AC13F5)
from octet11 to octet14 to identify the flow control message
e.g., chunkack message. In case of fragmentation from big
message, the second fragment onwards will start from the octet11,
which may have arbitrary value and cause mds to incorrectly
decode as a flow control message if the fragment starts with
value of 0x00AC13F5.

This patch fixes this rare case by decoding flow control message
only if the oct2-5 (mds global sequence number) and oct6-7 (mds
fragment number) are non-zero. Change MDS_PROT_FCTRL_ID:0xFDAC13F5



Complete diffstat:
--
 src/mds/mds_dt.h  |  2 +-
 src/mds/mds_tipc_fctrl_msg.cc | 20 +---
 2 files changed, 14 insertions(+), 8 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 2/2] mds: Improve error log for MDS_TIPC_FCTRL_ENABLED [#3095]

2019-10-03 Thread Minh Chau

This commit as part of #3095 updates the error string with
pattern "FCTRL:*Error[*]", in order to help grep-ing the
error in mds debug log.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 59 +---
 src/mds/mds_tipc_fctrl_portid.cc | 10 ---
 2 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 8018064..e7f53ed 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -93,7 +93,7 @@ void tmr_exp_cbk(void* uarg) {
 // send to fctrl thread
 if (m_NCS_IPC_SEND(_events, new Event(timer->type_),
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
-  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+  m_MDS_LOG_ERR("FCTRL: Error[Failed to send msg to mbx_events]");
 }
   }
 }
@@ -130,7 +130,9 @@ uint32_t process_flow_event(const Event& evt) {
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
 evt.fseq_, evt.svc_id_);
 } else {
-  m_MDS_LOG_ERR("PortId not found for evt:%d", (int)evt.type_);
+  m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
+  "RcvEvt[evt:%d], Error[PortId not found]",
+  evt.id_.node, evt.id_.ref, (int)evt.type_);
 }
   } else {
 if (evt.type_ == Event::Type::kEvtRcvData) {
@@ -169,7 +171,7 @@ uint32_t process_all_events(void) {
 
 if (pollres == -1) {
   if (errno == EINTR) continue;
-  m_MDS_LOG_ERR("FCTRL: poll() failed:%s", strerror(errno));
+  m_MDS_LOG_ERR("FCTRL: poll() failed, Error[%s]", strerror(errno));
   break;
 }
 
@@ -212,18 +214,21 @@ uint32_t create_ncs_task(void *task_hdl) {
   int prio_val = ((max_prio - min_prio) * 0.87);
 
   if (m_NCS_IPC_CREATE(_events) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("m_NCS_IPC_CREATE failed");
+m_MDS_LOG_ERR("FCTRL: m_NCS_IPC_CREATE failed, Error[%s]",
+strerror(errno));
 return NCSCC_RC_FAILURE;
   }
   if (m_NCS_IPC_ATTACH(_events) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("m_NCS_IPC_ATTACH failed");
+m_MDS_LOG_ERR("FCTRL: m_NCS_IPC_ATTACH failed, Error[%s]",
+strerror(errno));
 m_NCS_IPC_RELEASE(_events, nullptr);
 return NCSCC_RC_FAILURE;
   }
   if (ncs_task_create((NCS_OS_CB)process_all_events, 0,
   "OSAF_MDS", prio_val, policy, NCS_MDTM_STACKSIZE,
   _hdl) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("FCTRL: ncs_task_create() failed\n");
+m_MDS_LOG_ERR("FCTRL: ncs_task_create(), Error[%s]",
+strerror(errno));
 m_NCS_IPC_RELEASE(_events, nullptr);
 return NCSCC_RC_FAILURE;
   }
@@ -247,7 +252,8 @@ uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct 
tipc_portid id,
 
   if (create_ncs_task(_task_hdl) !=
   NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("FCTRL: create_ncs_task() failed\n");
+m_MDS_LOG_ERR("FCTRL: create_ncs_task() failed, Error[%s]",
+strerror(errno));
 return NCSCC_RC_FAILURE;
   }
   is_fctrl_enabled = true;
@@ -263,7 +269,8 @@ uint32_t mds_tipc_fctrl_shutdown(void) {
   portid_map_mutex.lock();
 
   if (ncs_task_release(p_task_hdl) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("FCTRL: Stop of the Created Task-failed:\n");
+m_MDS_LOG_ERR("FCTRL: Stop of the Created Task-failed, Error[%s]",
+strerror(errno));
   }
 
   m_NCS_IPC_DETACH(_events, nullptr, nullptr);
@@ -291,7 +298,8 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct tipc_portid 
id,
 
   TipcPortId *portid = portid_lookup(id);
   if (portid == nullptr) {
-m_MDS_LOG_ERR("FCTRL: PortId not found [node:%x, ref:%u] line:%u",
+m_MDS_LOG_ERR("FCTRL: [me] --> [node:%x, ref:%u], "
+"[line:%u], Error[PortId not found]",
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
@@ -316,7 +324,8 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 
   TipcPortId *portid = portid_lookup(id);
   if (portid == nullptr) {
-m_MDS_LOG_ERR("FCTRL: PortId not found [node:%x, ref:%u] line:%u",
+m_MDS_LOG_ERR("FCTRL: [me] --> [node:%x, ref:%u], "
+"[line:%u], Error[PortId not found]",
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
@@ -420,7 +429,8 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
 new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
 header.mseq_, header.mfrag_, ack.acked_fseq_, ack.chunk_size_),
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
-  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
+  strerror(errno));
   return NCSCC_RC_FAILURE;
 }
 return NCSCC_RC_SUCCESS;
@@ -434,7 +444,8 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
   new Event(Event::Type::kEvtDropData, id, data.svc_id_,
   header.mseq_, header.mfrag_, header.fseq_),
   NCS_IPC_PRIORITY_HIGH) !=

[devel] [PATCH 1/2] mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]

2019-10-03 Thread Minh Chau

In the scenario of recovery from split-brain, where both
active director services may suffer mds message loss due
to lost-contact tipc link. If MDS_TIPC_FCTRL_ENABLED is
set, the out-of-order message will be dropped, and there
is no mechanism to trigger the retransmission from receiver
side at this moment (the retransmission is only triggered
from sender as result of TIPC_ERR_OVERLOAD).

In reception of disordered message, the receiver can send
not-acknowledgement to notify the sender for retransmission.
Therefore, the sender can trigger retransmisison in the same
way as receiving TIPC_ERR_OVERLOAD.

This patch adds Nack message for retransmission of disordered
message detected from receiver side, and adds a missing call
to portid_map_mutex.unlock() in process_all_events().
---
 src/mds/mds_c_api.c  |  2 +-
 src/mds/mds_dt_common.c  |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 25 ++---
 src/mds/mds_tipc_fctrl_msg.cc| 35 ++-
 src/mds/mds_tipc_fctrl_msg.h | 22 ++
 src/mds/mds_tipc_fctrl_portid.cc | 32 
 src/mds/mds_tipc_fctrl_portid.h  |  3 ++-
 7 files changed, 106 insertions(+), 15 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index c41c8dd..132555b 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -4196,7 +4196,7 @@ void mds_mcm_msg_loss(MDS_SVC_HDL local_svc_hdl, MDS_DEST 
rem_adest,
 
/* Check whether the msg loss is enabled or not */
if (true != local_svc_info->i_msg_loss_indication) {
-   m_MDS_LOG_INFO(" MSG loss not enbaled mds_mcm_msg_loss\n");
+   m_MDS_LOG_NOTIFY("MSG loss is not enabled mds_mcm_msg_loss\n");
return;
}
 
diff --git a/src/mds/mds_dt_common.c b/src/mds/mds_dt_common.c
index 66652af..de13883 100644
--- a/src/mds/mds_dt_common.c
+++ b/src/mds/mds_dt_common.c
@@ -972,7 +972,7 @@ uint32_t mds_tmr_mailbox_processing(void)
.vdest_id);
break;
case MDS_REASSEMBLY_TMR:
-   m_MDS_LOG_DBG(
+   m_MDS_LOG_ERR(
"MDTM: Tmr Mailbox Processing:Reassemble 
Tmr Hdl=0x%08x",
mbx_evt_info->info.tmr_info_hdl);
mdtm_process_reassem_timer_event(
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 2366672..8018064 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -38,6 +38,7 @@ using mds::Timer;
 using mds::DataMessage;
 using mds::ChunkAck;
 using mds::HeaderMessage;
+using mds::Nack;
 
 namespace {
 // flow control enabled/disabled
@@ -142,7 +143,8 @@ uint32_t process_flow_event(const Event& evt) {
 if (evt.type_ == Event::Type::kEvtSendChunkAck) {
   portid->SendChunkAck(evt.fseq_, evt.svc_id_, evt.chunk_size_);
 }
-if (evt.type_ == Event::Type::kEvtDropData) {
+if (evt.type_ == Event::Type::kEvtDropData ||
+evt.type_ == Event::Type::kEvtRcvNack) {
   portid->ReceiveNack(evt.mseq_, evt.mfrag_,
   evt.fseq_);
 }
@@ -178,8 +180,10 @@ uint32_t process_all_events(void) {
 Event *evt = reinterpret_cast(ncs_ipc_non_blk_recv(
 _events));
 
-if (evt == nullptr) continue;
-
+if (evt == nullptr) {
+  portid_map_mutex.unlock();
+  continue;
+}
 if (evt->IsTimerEvent()) {
   process_timer_event(*evt);
 }
@@ -464,6 +468,21 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
 // skip this data msg
 return NCSCC_RC_FAILURE;
   }
+  if (header.msg_type_ == Nack::kNackMsgType) {
+// receive nack message
+Nack nack;
+nack.Decode(buffer);
+// send to the event thread
+if (m_NCS_IPC_SEND(_events,
+new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
+header.mseq_, header.mfrag_, nack.nacked_fseq_),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+}
+// return NCSCC_RC_FAILURE, so the tipc receiving thread (legacy) will
+// skip this data msg
+return NCSCC_RC_FAILURE;
+  }
 } else {
   // receive data message
   DataMessage data;
diff --git a/src/mds/mds_tipc_fctrl_msg.cc b/src/mds/mds_tipc_fctrl_msg.cc
index 064d977..0246b65 100644
--- a/src/mds/mds_tipc_fctrl_msg.cc
+++ b/src/mds/mds_tipc_fctrl_msg.cc
@@ -96,7 +96,7 @@ void DataMessage::Decode(uint8_t *msg) {
 
 DataMessage::~DataMessage() {
   if (msg_data_ != nullptr) {
-delete msg_data_;
+delete[] msg_data_;
 msg_data_ = nullptr;
   }
 }
@@ -139,4 +139,37 @@ void ChunkAck::Decode(uint8_t *msg) {
   chunk_size_ = ncs_decode_16bit();
 }
 
+

[devel] [PATCH 0/1] Review Request for mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]

2019-09-30 Thread Minh Chau

Summary: mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]
Review request for Ticket(s): 3095
Peer Reviewer(s): Hans, Vu, Gary, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3095
Base revision: 46e9e0f310a6c21dbc89a9ffd8bee26829342c0c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 22d7ea1769bac1b65631a09fddec63e6f5a146b5
Author: Minh Chau 
Date:   Tue, 1 Oct 2019 15:23:01 +1000

mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]

In the scenario of recovery from split-brain, where both
active director services may suffer mds message loss due
to lost-contact tipc link. If MDS_TIPC_FCTRL_ENABLED is
set, the out-of-order message will be dropped, and there
is no mechanism to trigger the retransmission from receiver
side at this moment (the retransmission is only triggered
from sender as result of TIPC_ERR_OVERLOAD).

In reception of disordered message, the receiver can send
not-acknowledgement to notify the sender for retransmission.
Therefore, the sender can trigger retransmisison in the same
way as receiving TIPC_ERR_OVERLOAD.

This patch adds Nack message for retransmission of disordered
message detected from receiver side.



Complete diffstat:
--
 src/mds/mds_c_api.c  |  2 +-
 src/mds/mds_dt_common.c  |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 19 ++-
 src/mds/mds_tipc_fctrl_msg.cc| 33 +
 src/mds/mds_tipc_fctrl_msg.h | 22 ++
 src/mds/mds_tipc_fctrl_portid.cc | 18 +-
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 7 files changed, 93 insertions(+), 4 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC

[devel] [PATCH 1/1] mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]

2019-09-30 Thread Minh Chau

In the scenario of recovery from split-brain, where both
active director services may suffer mds message loss due
to lost-contact tipc link. If MDS_TIPC_FCTRL_ENABLED is
set, the out-of-order message will be dropped, and there
is no mechanism to trigger the retransmission from receiver
side at this moment (the retransmission is only triggered
from sender as result of TIPC_ERR_OVERLOAD).

In reception of disordered message, the receiver can send
not-acknowledgement to notify the sender for retransmission.
Therefore, the sender can trigger retransmisison in the same
way as receiving TIPC_ERR_OVERLOAD.

This patch adds Nack message for retransmission of disordered
message detected from receiver side.
---
 src/mds/mds_c_api.c  |  2 +-
 src/mds/mds_dt_common.c  |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 19 ++-
 src/mds/mds_tipc_fctrl_msg.cc| 33 +
 src/mds/mds_tipc_fctrl_msg.h | 22 ++
 src/mds/mds_tipc_fctrl_portid.cc | 18 +-
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 7 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index c41c8dd..132555b 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -4196,7 +4196,7 @@ void mds_mcm_msg_loss(MDS_SVC_HDL local_svc_hdl, MDS_DEST 
rem_adest,
 
/* Check whether the msg loss is enabled or not */
if (true != local_svc_info->i_msg_loss_indication) {
-   m_MDS_LOG_INFO(" MSG loss not enbaled mds_mcm_msg_loss\n");
+   m_MDS_LOG_NOTIFY("MSG loss is not enabled mds_mcm_msg_loss\n");
return;
}
 
diff --git a/src/mds/mds_dt_common.c b/src/mds/mds_dt_common.c
index 66652af..de13883 100644
--- a/src/mds/mds_dt_common.c
+++ b/src/mds/mds_dt_common.c
@@ -972,7 +972,7 @@ uint32_t mds_tmr_mailbox_processing(void)
.vdest_id);
break;
case MDS_REASSEMBLY_TMR:
-   m_MDS_LOG_DBG(
+   m_MDS_LOG_ERR(
"MDTM: Tmr Mailbox Processing:Reassemble 
Tmr Hdl=0x%08x",
mbx_evt_info->info.tmr_info_hdl);
mdtm_process_reassem_timer_event(
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 2366672..65f1849 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -38,6 +38,7 @@ using mds::Timer;
 using mds::DataMessage;
 using mds::ChunkAck;
 using mds::HeaderMessage;
+using mds::Nack;
 
 namespace {
 // flow control enabled/disabled
@@ -142,7 +143,8 @@ uint32_t process_flow_event(const Event& evt) {
 if (evt.type_ == Event::Type::kEvtSendChunkAck) {
   portid->SendChunkAck(evt.fseq_, evt.svc_id_, evt.chunk_size_);
 }
-if (evt.type_ == Event::Type::kEvtDropData) {
+if (evt.type_ == Event::Type::kEvtDropData ||
+evt.type_ == Event::Type::kEvtRcvNack) {
   portid->ReceiveNack(evt.mseq_, evt.mfrag_,
   evt.fseq_);
 }
@@ -464,6 +466,21 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
 // skip this data msg
 return NCSCC_RC_FAILURE;
   }
+  if (header.msg_type_ == Nack::kNackMsgType) {
+// receive nack message
+Nack nack;
+nack.Decode(buffer);
+// send to the event thread
+if (m_NCS_IPC_SEND(_events,
+new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
+header.mseq_, header.mfrag_, nack.nacked_fseq_),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+}
+// return NCSCC_RC_FAILURE, so the tipc receiving thread (legacy) will
+// skip this data msg
+return NCSCC_RC_FAILURE;
+  }
 } else {
   // receive data message
   DataMessage data;
diff --git a/src/mds/mds_tipc_fctrl_msg.cc b/src/mds/mds_tipc_fctrl_msg.cc
index 064d977..f85568c 100644
--- a/src/mds/mds_tipc_fctrl_msg.cc
+++ b/src/mds/mds_tipc_fctrl_msg.cc
@@ -139,4 +139,37 @@ void ChunkAck::Decode(uint8_t *msg) {
   chunk_size_ = ncs_decode_16bit();
 }
 
+
+Nack::Nack(uint16_t svc_id, uint16_t fseq):
+svc_id_(svc_id), nacked_fseq_(fseq) {
+  msg_type_ = kNackMsgType;
+}
+
+void Nack::Encode(uint8_t *msg) {
+  uint8_t *ptr;
+  // encode protocol identifier
+  ptr = [Nack::FieldIndex::kProtocolIdentifier];
+  ncs_encode_32bit(, MDS_PROT_FCTRL_ID);
+  // encode message type
+  ptr = [Nack::FieldIndex::kFlowControlMessageType];
+  ncs_encode_8bit(, kNackMsgType);
+  // encode service id
+  ptr = [Nack::FieldIndex::kServiceId];
+  ncs_encode_16bit(, svc_id_);
+  // encode flow control sequence number
+  ptr = [Nack::FieldIndex::kFlowControlSequenceNumber];
+  ncs_encode_16bit(, nacked_fseq_);
+}
+
+void Nack::Decode(uint8_t *msg) {
+

[devel] [PATCH 6/9] mds: Implement kRcvBuffOverflow state [#1960]

2019-08-14 Thread Minh Chau

This patch implements the kRcvBuffOverflow state machine as
described in README file.
---
 src/mds/mds_tipc_fctrl_intf.cc   |   6 +-
 src/mds/mds_tipc_fctrl_msg.h |   1 +
 src/mds/mds_tipc_fctrl_portid.cc | 137 ++-
 src/mds/mds_tipc_fctrl_portid.h  |   5 +-
 4 files changed, 131 insertions(+), 18 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index c2d0922..397114e 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -285,14 +285,16 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 rc = NCSCC_RC_FAILURE;
   } else {
 if (portid->state_ != TipcPortId::State::kDisabled) {
-  portid->Queue(buffer, len);
+  bool sendable = portid->ReceiveCapable(len);
+  portid->Queue(buffer, len, sendable);
   // start txprob timer for the first msg sent out
   // do not start for other states
-  if (portid->state_ == TipcPortId::State::kStartup) {
+  if (sendable && portid->state_ == TipcPortId::State::kStartup) {
 txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
 m_MDS_LOG_DBG("FCTRL: Start txprob");
 portid->state_ = TipcPortId::State::kTxProb;
   }
+  if (sendable == false) rc = NCSCC_RC_FAILURE;
 }
   }
 
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 69f8048..e6b9662 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -110,6 +110,7 @@ class DataMessage: public BaseMessage {
   uint8_t* msg_data_{nullptr};
   uint8_t snd_type_{0};
 
+  bool is_sent_{true};
   DataMessage() {}
   virtual ~DataMessage();
   void Decode(uint8_t *msg) override;
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 84ecee9..e762290 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -82,6 +82,23 @@ uint64_t MessageQueue::Erase(uint16_t fseq_from, uint16_t 
fseq_to) {
   return msg_len;
 }
 
+DataMessage* MessageQueue::FirstUnsent() {
+  for (auto it = queue_.begin(); it != queue_.end(); ++it) {
+DataMessage *m = *it;
+if (m->is_sent_ == false) {
+  return m;
+}
+  }
+  return nullptr;
+}
+
+void MessageQueue::MarkUnsentFrom(uint16_t fseq) {
+  for (auto it = queue_.begin(); it != queue_.end(); ++it) {
+DataMessage *m = *it;
+if (m->header_.fseq_ >= fseq) m->is_sent_ = false;
+  }
+}
+
 void MessageQueue::Clear() {
   while (queue_.empty() == false) {
 DataMessage* msg = queue_.front();
@@ -99,7 +116,8 @@ TipcPortId::TipcPortId(struct tipc_portid id, int sock, 
uint16_t chksize,
 TipcPortId::~TipcPortId() {
   // Fake a TmrChunkAck event to ack all received messages
   ReceiveTmrChunkAck();
-  // clear all msg in sndqueue_
+  // flush all unsent msg in sndqueue_
+  FlushData();
   sndqueue_.Clear();
 }
 
@@ -109,6 +127,24 @@ uint64_t TipcPortId::GetUniqueId(struct tipc_portid id) {
   return uid;
 }
 
+void TipcPortId::FlushData() {
+  DataMessage* msg = nullptr;
+  do {
+// find the lowest sequence unsent yet
+msg = sndqueue_.FirstUnsent();
+if (msg != nullptr) {
+  Send(msg->msg_data_, msg->header_.msg_len_);
+  msg->is_sent_ = true;
+  m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
+  "FlushData[mseq:%u, mfrag:%u, fseq:%u], "
+  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
+  id_.node, id_.ref,
+  msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_,
+  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+}
+  } while (msg != nullptr);
+}
+
 uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) {
   struct sockaddr_tipc server_addr;
   ssize_t send_len = 0;
@@ -130,29 +166,49 @@ uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) 
{
   return rc;
 }
 
-uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length) {
+uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length,
+bool is_sent) {
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   DataMessage *msg = new DataMessage;
   msg->header_.Decode(const_cast(data));
   msg->Decode(const_cast(data));
   msg->msg_data_ = new uint8_t[length];
+  msg->is_sent_ = is_sent;
   memcpy(msg->msg_data_, data, length);
   sndqueue_.Queue(msg);
-  ++sndwnd_.send_;
-  sndwnd_.nacked_space_ += length;
-  m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
-  "SndData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
-  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
-  id_.node, id_.ref,
-  msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
-
+  if (is_sent) {
+++sndwnd_.send_;
+sndwnd_.nacked_space_ += length;
+m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
+"SndData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
+"sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
+id_.node, id_.ref,
+msg->header_.mseq_,

[devel] [PATCH 2/9] mds: Resolve c/c++ linking issue [#1960]

2019-08-14 Thread Minh Chau

(Sending on behalf of Thuan)
This patch solves the linking issue if mds_dt.h or mds_core.h
is included in c++ sources.
---
 src/mds/mds_core.h| 74 +++
 src/mds/mds_dt.h  |  4 +--
 src/mds/mds_dt2c.h| 67 --
 src/mds/mds_dt_tcp.c  |  2 ++
 src/mds/mds_dt_tcp.h  |  1 -
 src/mds/mds_dt_tipc.c |  2 ++
 6 files changed, 80 insertions(+), 70 deletions(-)

diff --git a/src/mds/mds_core.h b/src/mds/mds_core.h
index 37696d4..c09b428 100644
--- a/src/mds/mds_core.h
+++ b/src/mds/mds_core.h
@@ -573,6 +573,80 @@ extern uint32_t mds_mcm_free_msg_uba_start(MDS_ENCODED_MSG 
msg);
 extern void get_adest_details(MDS_DEST adest, char *adest_details);
 extern void get_subtn_adest_details(MDS_PWE_HDL pwe_hdl, MDS_SVC_ID svc_id,
 MDS_DEST adest, char *adest_details);
+#ifdef __cplusplus
+extern "C" {
+#endif
+/*  */
+/*  */
+/*MCM to MDTM   */
+/*  */
+/*  */
+
+/* Initialization of MDTM Module */
+uint32_t (*mds_mdtm_init)(NODE_ID node_id, uint32_t *mds_tipc_ref);
+
+/* Destroying the MDTM Module*/
+uint32_t (*mds_mdtm_destroy)(void);
+
+uint32_t (*mds_mdtm_send)(MDTM_SEND_REQ *req);
+
+/* SVC Install */
+uint32_t (*mds_mdtm_svc_install)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+ NCSMDS_SCOPE_TYPE install_scope,
+ V_DEST_RL role, MDS_VDEST_ID vdest_id,
+ NCS_VDEST_TYPE vdest_policy,
+ MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
+
+/* SVC Uninstall */
+uint32_t (*mds_mdtm_svc_uninstall)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+   NCSMDS_SCOPE_TYPE install_scope,
+   V_DEST_RL role, MDS_VDEST_ID vdest_id,
+   NCS_VDEST_TYPE vdest_policy,
+   MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
+
+/* SVC Subscribe */
+uint32_t (*mds_mdtm_svc_subscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+   NCSMDS_SCOPE_TYPE subscribe_scope,
+   MDS_SVC_HDL local_svc_hdl,
+   MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/*  added svc_hdl */
+/* SVC Unsubscribe */
+uint32_t (*mds_mdtm_svc_unsubscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+ NCSMDS_SCOPE_TYPE subscribe_scope,
+ MDS_SUBTN_REF_VAL subtn_ref_val);
+
+/* VDEST Install */
+uint32_t (*mds_mdtm_vdest_install)(MDS_VDEST_ID vdest_id);
+
+/* VDEST Uninstall */
+uint32_t (*mds_mdtm_vdest_uninstall)(MDS_VDEST_ID vdest_id);
+
+/* VDEST Subscribe */
+uint32_t (*mds_mdtm_vdest_subscribe)(MDS_VDEST_ID vdest_id,
+ MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/* VDEST Unsubscribe */
+uint32_t (*mds_mdtm_vdest_unsubscribe)(MDS_VDEST_ID vdest_id,
+   MDS_SUBTN_REF_VAL subtn_ref_val);
+
+/* Tx Register (For incrementing the use count) */
+uint32_t (*mds_mdtm_tx_hdl_register)(MDS_DEST adest);
+
+/* Tx Unregister (For decrementing the use count) */
+uint32_t (*mds_mdtm_tx_hdl_unregister)(MDS_DEST adest);
+
+/* Node subscription */
+uint32_t (*mds_mdtm_node_subscribe)(MDS_SVC_HDL svc_hdl,
+MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/* Node unsubscription */
+uint32_t (*mds_mdtm_node_unsubscribe)(MDS_SUBTN_REF_VAL subtn_ref_val);
+
+#ifdef __cplusplus
+}
+#endif
+
 /*  */
 /*  */
 /* MMGR Macros  */
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index a6e2801..b645bb4 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -214,10 +214,10 @@ typedef struct mdtm_ref_hdl_list {
   MDS_SVC_HDL svc_hdl;
 } MDTM_REF_HDL_LIST;
 
-MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
+extern MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
+extern NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_attach_mbx(SYSF_MBX mbx);
 void mds_buff_dump(uint8_t *buff, uint32_t len, uint32_t max);
-NCS_PATRICIA_TREE mdtm_reassembly_list;
 
 uint32_t mdtm_set_transport(MDTM_TX_TYPE transport);
 bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT msg);
diff --git a/src/mds/mds_dt2c.h b/src/mds/mds_dt2c.h
index 012999c..c92fecb 100644
--- a/src/mds/mds_dt2c.h
+++ b/src/mds/mds_dt2c.h
@@ -322,73 +322,6 @@ extern uint32_t gl_mds_checksum;
 
 /*  */
 /*  */
-/*MCM to MDTM   */
-/*  */
-/*

[devel] [PATCH 5/9] mds: Add state machine for tipc portid instance [#1960]

2019-08-14 Thread Minh Chau

This patch adds state machine to support tx probation timer.
---
 src/mds/mds_tipc_fctrl_intf.cc   |  47 +++--
 src/mds/mds_tipc_fctrl_msg.h |   1 +
 src/mds/mds_tipc_fctrl_portid.cc | 109 +++
 src/mds/mds_tipc_fctrl_portid.h  |  22 
 4 files changed, 176 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index bd0a8f6..c2d0922 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -34,6 +34,7 @@
 
 using mds::Event;
 using mds::TipcPortId;
+using mds::Timer;
 using mds::DataMessage;
 using mds::ChunkAck;
 using mds::HeaderMessage;
@@ -65,6 +66,11 @@ uint64_t sock_buf_size = 0;
 std::map portid_map;
 std::mutex portid_map_mutex;
 
+// probation timer event to enable flow control at receivers
+const int64_t kBaseTimerInt = 200;  // in centisecond
+const uint8_t kTxProbMaxRetries = 10;
+Timer txprob_timer(Event::Type::kEvtTmrTxProb);
+
 // chunk ack parameters
 // todo: The chunk ack timeout and chunk ack size should be configurable
 int kChunkAckTimeout = 1000;  // in miliseconds
@@ -76,13 +82,37 @@ TipcPortId* portid_lookup(struct tipc_portid id) {
   return portid_map[uid];
 }
 
+void tmr_exp_cbk(void* uarg) {
+  Timer* timer = reinterpret_cast(uarg);
+  if (timer != nullptr) {
+timer->is_active_ = false;
+// send to fctrl thread
+if (m_NCS_IPC_SEND(_events, new Event(timer->type_),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+}
+  }
+}
+
 void process_timer_event(const Event evt) {
+  bool txprob_restart = false;
   for (auto i : portid_map) {
 TipcPortId* portid = i.second;
+
+if (evt.type_ == Event::Type::kEvtTmrTxProb) {
+  if (portid->ReceiveTmrTxProb(kTxProbMaxRetries) == true) {
+txprob_restart = true;
+  }
+}
+
 if (evt.type_ == Event::Type::kEvtTmrChunkAck) {
   portid->ReceiveTmrChunkAck();
 }
   }
+  if (txprob_restart) {
+txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
+m_MDS_LOG_DBG("FCTRL: Restart txprob");
+  }
 }
 
 uint32_t process_flow_event(const Event evt) {
@@ -231,8 +261,10 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct 
tipc_portid id, uint16_t len,
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
-// assign the sequence number of the outgoing message
-*next_seq = portid->GetCurrentSeq();
+if (portid->state_ != TipcPortId::State::kDisabled) {
+  // assign the sequence number of the outgoing message
+  *next_seq = portid->GetCurrentSeq();
+}
   }
 
   portid_map_mutex.unlock();
@@ -252,7 +284,16 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
-portid->Queue(buffer, len);
+if (portid->state_ != TipcPortId::State::kDisabled) {
+  portid->Queue(buffer, len);
+  // start txprob timer for the first msg sent out
+  // do not start for other states
+  if (portid->state_ == TipcPortId::State::kStartup) {
+txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
+m_MDS_LOG_DBG("FCTRL: Start txprob");
+portid->state_ = TipcPortId::State::kTxProb;
+  }
+}
   }
 
   portid_map_mutex.unlock();
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 8e6a874..69f8048 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -45,6 +45,7 @@ class Event {
 kEvtDropData,  // event reported from tipc that a message is not
// delivered
 kEvtTmrAll,
+kEvtTmrTxProb,// event that tx probation timer expired for once
 kEvtTmrChunkAck,  // event to send the chunk ack
   };
   NCS_IPC_MSG next_{0};
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 64115d5..84ecee9 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -23,6 +23,35 @@
 
 namespace mds {
 
+Timer::Timer(Event::Type type) {
+  tmr_id_ = nullptr;
+  type_ = type;
+  is_active_ = false;
+}
+
+Timer::~Timer() {
+}
+
+void Timer::Start(int64_t period, void (*tmr_exp_func)(void*)) {
+  // timer will not start if it's already started
+  // period is in centiseconds
+  if (is_active_ == false) {
+if (tmr_id_ == nullptr) {
+  tmr_id_ = ncs_tmr_alloc(nullptr, 0);
+}
+tmr_id_ = ncs_tmr_start(tmr_id_, period, tmr_exp_func, this,
+nullptr, 0);
+is_active_ = true;
+  }
+}
+
+void Timer::Stop() {
+  if (is_active_ == true) {
+ncs_tmr_stop(tmr_id_);
+is_active_ = false;
+  }
+}
+
 void MessageQueue::Queue(DataMessage* msg) {
   queue_.push_back(msg);
 }
@@ -64,6 +93,7 @@ void MessageQueue::Clear() {
 TipcPortId::TipcPortId(struct tipc_portid id, int sock, uint16_t chksize,
 uint64_t sock_buf_size):
   id_(id), bsrsock_(sock), chunk_size_(chksize),

[devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-08-14 Thread Minh Chau

---
 src/mds/README | 221 +
 1 file changed, 221 insertions(+)
 create mode 100644 src/mds/README

diff --git a/src/mds/README b/src/mds/README
new file mode 100644
index 000..1b94632
--- /dev/null
+++ b/src/mds/README
@@ -0,0 +1,221 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+Background
+==
+If OpenSAF configures TIPC as transport, the MDS library today will use
+TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM
+datagram socket possibly encounters buffer overflow at receiver ends which
+has been documented in tipc.io[1]. A temporary solution for this buffer
+overflow issue is that the socket buffer size can be increased to a larger
+number. However, if the cluster continues either scaling out or adding more
+components, the system will be under dimensioned, thus the TIPC buffer
+overflow can occur again.
+
+MDS's solution for TIPC buffer overflow
+===
+If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message
+when the original message is failed to deliver. By this event, if the message
+has been saved in queue, MDS at sender sides can search and retransmit this
+message to the receivers.
+Once the messages in the sender's queue has been delivered successfully, MDS
+needs to remove them. MDS introduces its internal ACK message as an
+acknowledgment from receivers so that the senders can remove the messages
+out of the queue.
+Also, as such situation of buffer overflow at receivers, the retransmission may
+not succeed or even become worse at receiver ends (the more retransmission,
+the more overflow to occur). MDS imitates the sliding window in TCP[2] to
+control the flow of data message towards the receivers.
+
+Legacy MDS data message, new (data + ACK) MDS message, and upgradability
+
+Below is the MDS legacy message format that has been used till OpenSAF 5.19.07
+
+oct 0  message length
+oct 1
+--
+oct 2  sequence number: incremented for every message sent out to all destined
+...   tipc portid.
+oct 5
+--
+oct 6  fragment number: a message with same sequence number can be fragmented,
+oct 7  identified by this fragment number.
+--
+oct 8  length check: cross check with message length(oct0,1), NOT USED.
+oct 9
+--
+oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED
+--
+oct 11 mds length: length of mds header and mds data, starting from oct13
+oct 12
+--
+oct 13 mds header and data
+...
+--
+
+The current sequence number/fragment number are being used in MDS for all
+messages sent to all discovered tipc portid(s), meaning that every message is 
sent
+to any tipc portid, the sequence/fragment number is increased. The flow control
+needs its own sequence number sliding between two tipc porid(s) so that 
receivers
+can detect message drop due to buffer overload. Therefore, the oct8 and oct9 
are
+now reused as flow control sequence number. The oct10, protocol version, has 
new
+value of 0xB8. The format of new data message as below:
+
+oct 0  same
+...
+oct 7
+--
+oct 8  flow control sequence number
+oct 9
+--
+oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
+--
+oct 11 same
+...
+--
+
+The ACK message is introduced to acknowledge one data message or a chunk of
+accumulative data message. The ACK message format:
+
+oct 0  message length
+oct 1
+--
+oct 2  8 bytes, NOT USED
+
+oct 9
+--
+oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
+--
+oct 11 protocol identifier: MDS_PROT_FCTRL_ID
+..
+oct 14
+--
+oct 15 flow control message type: CHUNKACK
+--
+oct 16 service id: service id

[devel] [PATCH 9/9] mds: Add TIPC buffer overflow for mdstest [#1960]

2019-08-14 Thread Minh Chau

(Sending on behalf of Thuan)
---
 src/mds/apitest/mdstest.c  |   5 +-
 src/mds/apitest/mdstipc.h  |   6 +-
 src/mds/apitest/mdstipc_api.c  | 237 +
 src/mds/apitest/mdstipc_conf.c |  19 ++--
 4 files changed, 253 insertions(+), 14 deletions(-)

diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c
index bf6e173..a381aad 100644
--- a/src/mds/apitest/mdstest.c
+++ b/src/mds/apitest/mdstest.c
@@ -84,12 +84,13 @@ int main(int argc, char **argv)
return 0;
}
 
-   if (mds_startup() != 0) {
+   if ((suite != 27) && (mds_startup() != 0)) {
printf("Fail to start mds agents\n");
return 1;
}
 
int rc = test_run(suite, tcase);
-   mds_shutdown();
+   if (suite != 27)
+   mds_shutdown();
return rc;
 }
diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index 07cf556..2bd44b4 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -20,6 +20,7 @@
 
 #include "base/ncssysf_tsk.h"
 #include "base/ncssysf_def.h"
+#include "base/ncsencdec_pub.h"
 
 typedef struct tet_task {
   NCS_OS_CB entry;
@@ -163,9 +164,6 @@ int gl_DEC_FLAT_CB_FAIL;
 int gl_RECEIVE_CB_FAIL;
 int gl_COPY_CB_FAIL;
 
-uint32_t ncs_encode_16bit(uint8_t **, uint32_t);
-uint16_t ncs_decode_16bit(uint8_t **);
-
 uint32_t tet_mds_svc_callback(NCSMDS_CALLBACK_INFO *);
 /**MDS call back routines */
 uint32_t tet_mds_cb_cpy(NCSMDS_CALLBACK_INFO *);
@@ -528,5 +526,7 @@ uint32_t mds_send_get_redack(MDS_HDL mds_hdl, MDS_SVC_ID 
svc_id,
 uint32_t mds_send_redrsp_getack(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,
 int64_t time_to_wait, TET_MDS_MSG *response);
 uint32_t tet_sync_point(void);
+int mds_startup(void);
+int mds_shutdown(void);
 
 #endif  // MDS_APITEST_MDSTIPC_H_
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index b5547e3..8057284 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -27,6 +27,7 @@
 #include "base/ncs_main_papi.h"
 #include "mdstipc.h"
 #include "base/ncssysf_tmr.h"
+#include "base/osaf_poll.h"
 
 #define MSG_SIZE MDS_DIRECT_BUF_MAXSIZE
 static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
@@ -13104,6 +13105,234 @@ void tet_create_default_PWE_VDEST_tp()
test_validate(FAIL, 0);
 }
 
+void tet_sender(char *send_buff, uint32_t buff_len, int msg_count)
+{
+   int live = 100; // sender live max 100s
+   TET_MDS_MSG *mesg;
+   mesg = (TET_MDS_MSG *)malloc(sizeof(TET_MDS_MSG));
+   memset(mesg, 0, sizeof(TET_MDS_MSG));
+
+   printf("\nStarted Sender (pid:%d) svc_id=%d\n",
+   (int)getpid(), NCSMDS_SVC_ID_INTERNAL_MIN);
+   if (adest_get_handle() != NCSCC_RC_SUCCESS) {
+   printf("\n: Sender FAIL to get adest handle\n");
+   exit(1);
+   }
+
+   if (mds_service_install(gl_tet_adest.mds_pwe1_hdl,
+   NCSMDS_SVC_ID_INTERNAL_MIN, 1,
+   NCSMDS_SCOPE_NONE, false, false) != 
NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL to install the service\n");
+   exit(1);
+   }
+
+   MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   if (mds_service_subscribe(
+   gl_tet_adest.mds_pwe1_hdl, NCSMDS_SVC_ID_INTERNAL_MIN,
+   NCSMDS_SCOPE_INTRANODE, 1, svcids) != NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL to subscribe receiver\n");
+   exit(1);
+   }
+
+   while(!gl_tet_adest.svc[0].svcevt[0].dest && live-- > 0) {
+   printf("\nSender is waiting for receiver UP\n");
+   sleep(1);
+   }
+   printf("\nSender got last event=%d of svc_id=%d dest=<%llx>\n",
+   gl_tet_adest.svc[0].svcevt[0].event,
+   gl_tet_adest.svc[0].svcevt[0].svc_id,
+   gl_tet_adest.svc[0].svcevt[0].dest);
+
+   // wait for receiver subscribe sender
+   // otherwise, receiver won't detect loss message
+   sleep(1);
+
+   uint32_t offset = 0;
+   uint32_t msg_len = buff_len / msg_count;
+   for (int i = 1; i <= msg_count; i++) {
+   memcpy(mesg->send_data, _buff[offset], msg_len);
+   mesg->send_len = msg_len;
+   if (mds_just_send(gl_tet_adest.mds_pwe1_hdl,
+ NCSMDS_SVC_ID_INTERNAL_MIN,
+ NCSMDS_SVC_ID_EXTERNAL_MIN,
+ gl_tet_adest.svc[0].svcevt[0].dest,
+ MDS_SEND_PRIORITY_HIGH,
+ mesg) != NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL send message\n");
+   exit(1);
+   } else {
+   printf("\nSender SENT message %d successfully\n", i);
+   }
+

[devel] [PATCH 7/9] mds: Add configurable parameters [#1960]

2019-08-14 Thread Minh Chau

This patch makes the solution of TIPC buffer overflow configurable,
as well as the ack timeout/ack size.
For example:
The service config file can export the following environment variables

export MDS_TIPC_FCTRL_ENABLED=1
export MDS_TIPC_FCTRL_ACKTIMEOUT=1000
export MDS_TIPC_FCTRL_ACKSIZE=1

If MDS_TIPC_FCTRL_ACKTIMEOUT, MDS_TIPC_FCTRL_ACKSIZE are not specified,
the default values are used.
---
 src/mds/mds_dt_tipc.c  | 19 ---
 src/mds/mds_tipc_fctrl_intf.cc | 25 +++--
 src/mds/mds_tipc_fctrl_intf.h  |  3 ++-
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fef1c50..1b6c3f8 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -342,9 +342,22 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
 
/* Create flow control tasks if enabled*/
-   gl_mds_pro_ver = MDS_PROT_FCTRL;
-   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id,
-   (uint64_t)optval, tipc_mcast_enabled);
+   char* ptr;
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ENABLED")) != NULL) {
+   if (atoi(ptr) == 1) {
+   gl_mds_pro_ver = MDS_PROT_FCTRL;
+   int ackto = -1;
+   int acksize = -1;
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ACKTIMEOUT")) != 
NULL) {
+   ackto = atoi(ptr);
+   }
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ACKSIZE")) != NULL) {
+   acksize = atoi(ptr);
+   }
+   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id, 
(uint64_t)optval,
+   ackto, acksize, tipc_mcast_enabled);
+   }
+   }
 
/* Create a task to receive the events and data */
if (mdtm_create_rcv_task(tipc_cb.hdle_mdtm) != NCSCC_RC_SUCCESS) {
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 397114e..8949937 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -40,6 +40,9 @@ using mds::ChunkAck;
 using mds::HeaderMessage;
 
 namespace {
+// flow control enabled/disabled
+bool is_fctrl_enabled = false;
+
 // multicast/broadcast enabled
 // todo: to be removed if flow control support it
 bool is_mcast_enabled = true;
@@ -225,7 +228,8 @@ uint32_t create_ncs_task(void *task_hdl) {
 }  // end local namespace
 
 uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct tipc_portid id,
-uint64_t rcv_buf_size, bool mcast_enabled) {
+uint64_t rcv_buf_size, int32_t ackto, int32_t acksize,
+bool mcast_enabled) {
   if (create_ncs_task(_task_hdl) !=
   NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Start of the Created Task-failed:\n");
@@ -234,8 +238,10 @@ uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct 
tipc_portid id,
   data_sock_fd = dgramsock;
   snd_rcv_portid = id;
   sock_buf_size = rcv_buf_size;
+  is_fctrl_enabled = true;
   is_mcast_enabled = mcast_enabled;
-
+  if (ackto != -1) kChunkAckTimeout = ackto;
+  if (acksize != -1) kChunkAckSize = acksize;
   m_MDS_LOG_NOTIFY("FCTRL: Initialize [node:%x, ref:%u]",
   id.node, id.ref);
 
@@ -243,6 +249,7 @@ uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct 
tipc_portid id,
 }
 
 uint32_t mds_tipc_fctrl_shutdown(void) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
   if (ncs_task_release(p_task_hdl) != NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Stop of the Created Task-failed:\n");
   }
@@ -251,6 +258,8 @@ uint32_t mds_tipc_fctrl_shutdown(void) {
 
 uint32_t mds_tipc_fctrl_sndqueue_capable(struct tipc_portid id, uint16_t len,
   uint16_t* next_seq) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   portid_map_mutex.lock();
@@ -274,6 +283,8 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct tipc_portid 
id, uint16_t len,
 
 uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, uint16_t len,
 struct tipc_portid id) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   portid_map_mutex.lock();
@@ -304,6 +315,8 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 }
 
 uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, uint32_t type) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   MDS_SVC_ID svc_id = (uint16_t)(type & MDS_EVENT_MASK_FOR_SVCID);
 
   portid_map_mutex.lock();
@@ -328,6 +341,8 @@ uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, 
uint32_t type) {
 }
 
 uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, uint32_t type) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   MDS_SVC_ID svc_id = (uint16_t)(type & MDS_EVENT_MASK_FOR_SVCID);
 
   portid_map_mutex.lock();
@@ -345,6 +360,8 @@ uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, 
uint32_t type) {
 }

[devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-08-14 Thread Minh Chau

This is a collaborative patch of two participants:Thuan, Minh.

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.
---
 src/mds/Makefile.am  |  10 +-
 src/mds/mds_dt.h |   8 +-
 src/mds/mds_dt_tipc.c| 188 +---
 src/mds/mds_tipc_fctrl_intf.cc   | 376 +++
 src/mds/mds_tipc_fctrl_intf.h|  47 +
 src/mds/mds_tipc_fctrl_msg.cc| 142 +++
 src/mds/mds_tipc_fctrl_msg.h | 129 ++
 src/mds/mds_tipc_fctrl_portid.cc | 261 +++
 src/mds/mds_tipc_fctrl_portid.h  |  87 +
 9 files changed, 1184 insertions(+), 64 deletions(-)
 create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
 create mode 100644 src/mds/mds_tipc_fctrl_intf.h
 create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
 create mode 100644 src/mds/mds_tipc_fctrl_msg.h
 create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
 create mode 100644 src/mds/mds_tipc_fctrl_portid.h

diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 2d7b652..d849e8f 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
 if ENABLE_TIPC_TRANSPORT
 noinst_HEADERS += src/mds/mds_dt_tipc.h \
src/mds/mds_tipc_recvq_stats.h \
-   src/mds/mds_tipc_recvq_stats_impl.h
+   src/mds/mds_tipc_recvq_stats_impl.h \
+   src/mds/mds_tipc_fctrl_intf.h \
+   src/mds/mds_tipc_fctrl_portid.h \
+   src/mds/mds_tipc_fctrl_msg.h
 lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
src/mds/mds_tipc_recvq_stats.cc \
-   src/mds/mds_tipc_recvq_stats_impl.cc
+   src/mds/mds_tipc_recvq_stats_impl.cc \
+   src/mds/mds_tipc_fctrl_intf.cc \
+   src/mds/mds_tipc_fctrl_portid.cc \
+   src/mds/mds_tipc_fctrl_msg.cc
 endif
 
 if ENABLE_TESTS
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index b645bb4..d9e8633 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL ref);
 uint32_t mds_tmr_mailbox_processing(void);
 uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL *svc_hdl);
 uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t seq_num,
-   uint16_t frag_byte);
+   uint16_t frag_byte, uint16_t fctrl_seq_num);
 uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
 uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, uint64_t tipc_id,
 uint32_t *buff_dump);
@@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 
 #define MDS_PROT 0xA0
 #define MDS_VERSION 0x08
-#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
+#define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* MDS protocol/version for flow control */
+#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
+#define MDS_PROT_FCTRL_ID 0x00AC13F5
+
 /* Added for the subscription changes */
 #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
 #define MDS_TIPC_COMMON_ID 0x01001000
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 86b52bb..fef1c50 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -47,6 +47,7 @@
 #include "mds_dt_tipc.h"
 #include "mds_dt_tcp_disc.h"
 #include "mds_core.h"
+#include "mds_tipc_fctrl_intf.h"
 #include "mds_tipc_recvq_stats.h"
 #include "base/osaf_utility.h"
 #include "base/osaf_poll.h"
@@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
+uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
 
-static bool get_tipc_port_id(int sock, uint32_t* port_id) {
+static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
struct sockaddr_tipc addr;
socklen_t sz = sizeof(addr);
 
memset(, 0, sizeof(addr));
-   *port_id = 0;
+   port_id->node = 0;
+   port_id->ref = 0;
if (0 > getsockname(sock, (struct sockaddr *), )) {
syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: %s",
   strerror(errno));
return false;
}
 
-   *port_id = addr.addr.id.ref;
+   *port_id = addr.addr.id;
return true;
 }
 
@@ -240,12 +243,13 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
 
/* Code for getting the self tipc random number */
-   if (!get_tipc_port_id(tipc_cb.BSRsock, mds_tipc_ref)) {
+   struct tipc_portid port_id;
+   if (!get_tipc_port_id(tipc_cb.BSRsock, _id)) {

[devel] [PATCH 0/9] Review Request for mds: Add solution for TIPC buffer overflow [#1960]

2019-08-14 Thread Minh Chau

Summary: mds: Add solution for TIPC buffer overflow at MDS [#1960]
Review request for Ticket(s): 1960
Peer Reviewer(s): Anders, HansN, Lennart, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-1960
Base revision: 2d85d5d9264c6a7d1c6601b900fb810facbee3ac
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
Sending on behalf of Thuan & Minh.
Some pending tasks to accomplish

. Handle broadcast/multicast mds message with flow control.
. Reduce the memory re-allocation overhead if enables flow control.
(At this moment, memory is allocated at mds_dt_tip.c and cloned to buffer
for retransmission queue again).
. The sequence number arithmetic (sna) should be implemented in /base code.
. Adding mdstest to cover sna wrapped-round
. MDS_CHECKSUM_ENABLE_FLAG

revision d208cff344c35afd25ac01ab4f9057d153fe9495
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Add TIPC buffer overflow for mdstest [#1960]

(Sending on behalf of Thuan)



revision d04c3e99744b86278c6a54d8ec1e4caabfbcabd2
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Apply serial number arithmetic for sequence counter [#1960]

This patch applies the serial number arithmetic for the flow control
sequence number, referenced to RFC1982.

This is only temporary patch, a proper one could be made in /base
with template for others type, e.g uint32. Then mds reuses it from
/base.



revision 7bbaf4eed324ba0575f4893a37eb10c9b7df4426
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Add configurable parameters [#1960]

This patch makes the solution of TIPC buffer overflow configurable,
as well as the ack timeout/ack size.
For example:
The service config file can export the following environment variables

export MDS_TIPC_FCTRL_ENABLED=1
export MDS_TIPC_FCTRL_ACKTIMEOUT=1000
export MDS_TIPC_FCTRL_ACKSIZE=1

If MDS_TIPC_FCTRL_ACKTIMEOUT, MDS_TIPC_FCTRL_ACKSIZE are not specified,
the default values are used.



revision 842dcecbdbafb3744a99ead42899f955eb79f94f
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Implement kRcvBuffOverflow state [#1960]

This patch implements the kRcvBuffOverflow state machine as
described in README file.



revision ed03d11a194f9775fc43003acc35924ef28227b7
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Add state machine for tipc portid instance [#1960]

This patch adds state machine to support tx probation timer.



revision 759f7ef7b55a8f27118e82c757c21b2f452c8b7c
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:22:12 +1000

mds: Add timeout for ack message [#1960]

If the ack size is configured greater than 1, there should be a timeout
at receiver ends to send the ack message back to senders.
The ack message timeout utilizes the poll timeout in flow control thread
to make mds lightweight (in contrast to additional timer threads).



revision 7c115e411f9de31b2ffe3c32ed401d7b8a3de696
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:21:33 +1000

mds: Add implementation for TIPC buffer overflow solution [#1960]

This is a collaborative patch of two participants:Thuan, Minh.

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.



revision 4096f66de92f89ef30a54a137465384b74143800
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:12:19 +1000

mds: Resolve c/c++ linking issue [#1960]

(Sending on behalf of Thuan)
This patch solves the linking issue if mds_dt.h or mds_core.h
is included in c++ sources.



revision 04152d708f30abf813dad1c18d5ed3d73df4ef3d
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 16:11:17 +1000

mds: Add README for solution of TIPC buffer overflow at MDS [#1960]



Added Files:

 src/mds/mds_tipc_fctrl_intf.cc
 src/mds/mds_tipc_fctrl_intf.h
 src/mds/mds_tipc_fctrl_msg.cc
 src/mds/mds_tipc_fctrl_msg.h
 src/mds/mds_tipc_fctrl_portid.cc
 src/mds/mds_tipc_fctrl_portid.h
 src/mds/README


Complete diffstat:
--
 src/mds/Makefile.am  |  10 +-
 src/mds/README

[devel] [PATCH 4/9] mds: Add timeout for ack message [#1960]

2019-08-14 Thread Minh Chau

If the ack size is configured greater than 1, there should be a timeout
at receiver ends to send the ack message back to senders.
The ack message timeout utilizes the poll timeout in flow control thread
to make mds lightweight (in contrast to additional timer threads).
---
 src/mds/mds_tipc_fctrl_intf.cc   | 33 ++---
 src/mds/mds_tipc_fctrl_msg.h |  6 ++
 src/mds/mds_tipc_fctrl_portid.cc | 15 +++
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 91b9107..bd0a8f6 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -66,7 +66,8 @@ std::map portid_map;
 std::mutex portid_map_mutex;
 
 // chunk ack parameters
-// todo: The chunk ack size should be configurable
+// todo: The chunk ack timeout and chunk ack size should be configurable
+int kChunkAckTimeout = 1000;  // in miliseconds
 uint16_t kChunkAckSize = 3;
 
 TipcPortId* portid_lookup(struct tipc_portid id) {
@@ -75,6 +76,15 @@ TipcPortId* portid_lookup(struct tipc_portid id) {
   return portid_map[uid];
 }
 
+void process_timer_event(const Event evt) {
+  for (auto i : portid_map) {
+TipcPortId* portid = i.second;
+if (evt.type_ == Event::Type::kEvtTmrChunkAck) {
+  portid->ReceiveTmrChunkAck();
+}
+  }
+}
+
 uint32_t process_flow_event(const Event evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   TipcPortId *portid = portid_lookup(evt.id_);
@@ -110,7 +120,7 @@ uint32_t process_flow_event(const Event evt) {
 uint32_t process_all_events(void) {
   enum { FD_FCTRL = 0, NUM_FDS };
 
-  int poll_tmo = MDTM_TIPC_POLL_TIMEOUT;
+  int poll_tmo = kChunkAckTimeout;
   while (true) {
 int pollres;
 struct pollfd pfd[NUM_FDS] = {{0}};
@@ -135,11 +145,24 @@ uint32_t process_all_events(void) {
 if (evt == nullptr) continue;
 
 portid_map_mutex.lock();
-process_flow_event(*evt);
+
+if (evt->IsTimerEvent()) {
+  process_timer_event(*evt);
+}
+if (evt->IsFlowEvent()) {
+  process_flow_event(*evt);
+}
+
 delete evt;
 portid_map_mutex.unlock();
   }
 }
+// timeout, scan all portid and send ack msgs
+if (pollres == 0) {
+  portid_map_mutex.lock();
+  process_timer_event(Event(Event::Type::kEvtTmrChunkAck));
+  portid_map_mutex.unlock();
+}
   }  /* while */
   return NCSCC_RC_SUCCESS;
 }
@@ -368,6 +391,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
   portid_map_mutex.lock();
   uint32_t rc = process_flow_event(Event(Event::Type::kEvtRcvData,
   id, data.svc_id_, header.mseq_, header.mfrag_, header.fseq_));
+  if (rc == NCSCC_RC_CONTINUE) {
+process_timer_event(Event(Event::Type::kEvtTmrChunkAck));
+rc = NCSCC_RC_SUCCESS;
+  }
   portid_map_mutex.unlock();
   return rc;
 }
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 677f256..8e6a874 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -44,6 +44,8 @@ class Event {
// selective data msgs (not supported)
 kEvtDropData,  // event reported from tipc that a message is not
// delivered
+kEvtTmrAll,
+kEvtTmrChunkAck,  // event to send the chunk ack
   };
   NCS_IPC_MSG next_{0};
   Type type_;
@@ -68,6 +70,10 @@ class Event {
 fseq_(f_seg_num), chunk_size_(chunk_size) {
 type_ = type;
   }
+  bool IsTimerEvent() { return (type_ > Type::kEvtTmrAll); }
+  bool IsFlowEvent() {
+return (Type::kEvtDataFlowAll < type_ && type_ < Type::kEvtTmrAll);
+  }
 };
 
 class BaseMessage {
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 24d13ee..64115d5 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -67,6 +67,8 @@ TipcPortId::TipcPortId(struct tipc_portid id, int sock, 
uint16_t chksize,
 }
 
 TipcPortId::~TipcPortId() {
+  // Fake a TmrChunkAck event to ack all received messages
+  ReceiveTmrChunkAck();
   // clear all msg in sndqueue_
   sndqueue_.Clear();
 }
@@ -156,6 +158,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   // send ack for @chunk_size_ msgs starting from fseq
   SendChunkAck(fseq, svc_id, chunk_size_);
   rcvwnd_.acked_ = rcvwnd_.rcv_;
+  rc = NCSCC_RC_CONTINUE;
 }
   } else {
 // todo: update rcvwnd_.nacked_space_.
@@ -258,4 +261,16 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
   }
 }
 
+void TipcPortId::ReceiveTmrChunkAck() {
+  uint16_t chksize = rcvwnd_.rcv_ - rcvwnd_.acked_;
+  if (chksize > 0) {
+m_MDS_LOG_DBG("FCTRL: [node:%x, ref:%u], "
+"ChkAckExp",
+id_.node, id_.ref);
+// send ack for @chksize msgs starting from rcvwnd_.rcv_
+SendChunkAck(rcvwnd_.rcv_, 0, chksize);
+rcvwnd_.acked_ =

[devel] [PATCH 8/9] mds: Apply serial number arithmetic for sequence counter [#1960]

2019-08-14 Thread Minh Chau

This patch applies the serial number arithmetic for the flow control
sequence number, referenced to RFC1982.

This is only temporary patch, a proper one could be made in /base
with template for others type, e.g uint32. Then mds reuses it from
/base.
---
 src/mds/mds_tipc_fctrl_portid.cc | 53 +--
 src/mds/mds_tipc_fctrl_portid.h  | 77 
 2 files changed, 97 insertions(+), 33 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index e762290..365d72f 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -66,12 +66,12 @@ DataMessage* MessageQueue::Find(uint32_t mseq, uint16_t 
mfrag) {
   return nullptr;
 }
 
-uint64_t MessageQueue::Erase(uint16_t fseq_from, uint16_t fseq_to) {
+uint64_t MessageQueue::Erase(Seq16 fseq_from, Seq16 fseq_to) {
   uint64_t msg_len = 0;
   for (auto it = queue_.begin(); it != queue_.end();) {
 DataMessage *m = *it;
-if (fseq_from <= m->header_.fseq_ &&
-m->header_.fseq_ <= fseq_to) {
+if (fseq_from <= Seq16(m->header_.fseq_) &&
+Seq16(m->header_.fseq_) <= fseq_to) {
   msg_len += m->header_.msg_len_;
   it = queue_.erase(it);
   delete m;
@@ -92,10 +92,10 @@ DataMessage* MessageQueue::FirstUnsent() {
   return nullptr;
 }
 
-void MessageQueue::MarkUnsentFrom(uint16_t fseq) {
+void MessageQueue::MarkUnsentFrom(Seq16 fseq) {
   for (auto it = queue_.begin(); it != queue_.end(); ++it) {
 DataMessage *m = *it;
-if (m->header_.fseq_ >= fseq) m->is_sent_ = false;
+if (Seq16(m->header_.fseq_) >= fseq) m->is_sent_ = false;
   }
 }
 
@@ -140,7 +140,7 @@ void TipcPortId::FlushData() {
   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
   id_.node, id_.ref,
   msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_,
-  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+  sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
 }
   } while (msg != nullptr);
 }
@@ -185,7 +185,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
length,
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
   } else {
 ++sndwnd_.send_;
 m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
@@ -193,7 +193,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
length,
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
   }
   return rc;
 }
@@ -248,13 +248,13 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 txprob_cnt_, (uint8_t)state_);
   }
   // update receiver sequence window
-  if (rcvwnd_.acked_ < fseq && rcvwnd_.rcv_ + 1 == fseq) {
+  if (rcvwnd_.acked_ < Seq16(fseq) && rcvwnd_.rcv_ + Seq16(1) == Seq16(fseq)) {
 m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
 "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
 
 ++rcvwnd_.rcv_;
 if (rcvwnd_.rcv_ - rcvwnd_.acked_ >= chunk_size_) {
@@ -279,7 +279,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 // It is not used for now, so ignore it.
 
 // check for transmission error
-if (rcvwnd_.rcv_ + 1 < fseq) {
+if (rcvwnd_.rcv_ + Seq16(1) < Seq16(fseq)) {
   if (rcvwnd_.rcv_ == 0 && rcvwnd_.acked_ == 0) {
 // peer does not realize that this portid reset
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
@@ -288,7 +288,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 "Warning[portid reset]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
 
 rcvwnd_.rcv_ = fseq;
   } else {
@@ -300,10 +300,10 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 "Error[msg loss]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
   }
 }
-if (fseq <= rcvwnd_.acked_) {
+if (Seq16(fseq) <= rcvwnd_.acked_) {
   rc = NCSCC_RC_FAILURE;
   // unexpected retransmission

[devel] [PATCH 0/9] Review Request for mds: Add solution for TIPC buffer overflow [#1960]

2019-08-14 Thread Minh Chau

Summary: mds: Add solution of TIPC buffer overflow at MDS [#1960]
Review request for Ticket(s): 1960
Peer Reviewer(s): Anders, HansN, Lennart, Gary, Vu, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-1960
Base revision: 2d85d5d9264c6a7d1c6601b900fb810facbee3ac
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
Sending on behalf of Thuan & Minh.
Some pending tasks to accomplish

. Handle broadcast/multicast mds message with flow control.
. Reduce the memory re-allocation overhead if enables flow control.
(At this moment, memory is allocated at mds_dt_tip.c and cloned to buffer
for retransmission queue again).
. The sequence number arithmetic (sna) should be implemented in /base code.
. Adding mdstest to cover sna wrapped-round
. MDS_CHECKSUM_ENABLE_FLAG

revision c49fdeb17fae20b4e0e9af134cc9b60de846c271
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Add TIPC buffer overflow for mdstest [#1960]



revision 6948a2456642600d541b55c9787bb17cfde48a7e
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Apply serial number arithmetic for sequence counter [#1960]

This patch applies the serial number arithmetic for the flow control
sequence number, referenced to RFC1982.

This is only temporary patch, a proper one could be made in /base
with template for others type, e.g uint32. Then mds reuses it from
/base.



revision 87662f659682f813f6746eef0e60d1e52ab03ff1
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Add configurable parameters [#1960]

This patch makes the solution of TIPC buffer overflow configurable,
as well as the ack timeout/ack size.
For example:
The service config file can export the following environment variables

export MDS_TIPC_FCTRL_ENABLED=1
export MDS_TIPC_FCTRL_ACKTIMEOUT=1000
export MDS_TIPC_FCTRL_ACKSIZE=1

If MDS_TIPC_FCTRL_ACKTIMEOUT, MDS_TIPC_FCTRL_ACKSIZE are not specified,
the default values are used.



revision cd4f8af3f53b16aa05d11f30e25da209e7e51e98
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Implement kRcvBuffOverflow state [#1960]

This patch implements the kRcvBuffOverflow state machine as
described in README file.



revision d5c9e8fc8605f453155f4a260ebda0f78ee017b4
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Add state machine for tipc portid instance [#1960]

This patch adds state machine to support tx probation timer.



revision f3f159d0aa3f43c4b28cbd6f0c7c9f041f4b6fd8
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:40:05 +1000

mds: Add timeout for ack message [#1960]

If the ack size is configured greater than 1, there should be a timeout
at receiver ends to send the ack message back to senders.
The ack message timeout utilizes the poll timeout in flow control thread
to make mds lightweight (in contrast to additional timer threads).



revision 6b69713c85dfc46b4d570a61eb2e2c4b71c354f9
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:39:39 +1000

mds: Add implementation for TIPC buffer overflow solution [#1960]

This is a collaborative patch of two participants:
- Tran Thuan 
- Minh Chau 

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.



revision f71e0ba303ea75b8f845d9f72ab903af93817c87
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:08:30 +1000

mds: Resolve c/c++ linking issue [#1960]

This patch solves the linking issue if mds_dt.h or mds_core.h
is included in c++ sources.



revision 983ad4f94c9b9d458ba5a3f12351cd5b143c78d5
Author: Minh Chau 
Date:   Wed, 14 Aug 2019 15:08:30 +1000

mds: Add README for solution of TIPC buffer overflow at MDS [#1960]



Added Files:

 src/mds/mds_tipc_fctrl_intf.cc
 src/mds/mds_tipc_fctrl_intf.h
 src/mds/mds_tipc_fctrl_msg.cc
 src/mds/mds_tipc_fctrl_msg.h
 src/mds/mds_tipc_fctrl_portid.cc
 src/mds/mds_tipc_fctrl_portid.h
 src/mds/README


Complete diffstat:
--
 src/mds/Makefile.am  |  10 +-
 src/mds/README   | 221 ++
 src/mds/apitest/mdstest.

[devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-08-14 Thread Minh Chau

This is a collaborative patch of two participants:
- Tran Thuan 
- Minh Chau 

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.
---
 src/mds/Makefile.am  |  10 +-
 src/mds/mds_dt.h |   8 +-
 src/mds/mds_dt_tipc.c| 188 +---
 src/mds/mds_tipc_fctrl_intf.cc   | 376 +++
 src/mds/mds_tipc_fctrl_intf.h|  47 +
 src/mds/mds_tipc_fctrl_msg.cc| 142 +++
 src/mds/mds_tipc_fctrl_msg.h | 129 ++
 src/mds/mds_tipc_fctrl_portid.cc | 261 +++
 src/mds/mds_tipc_fctrl_portid.h  |  87 +
 9 files changed, 1184 insertions(+), 64 deletions(-)
 create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
 create mode 100644 src/mds/mds_tipc_fctrl_intf.h
 create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
 create mode 100644 src/mds/mds_tipc_fctrl_msg.h
 create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
 create mode 100644 src/mds/mds_tipc_fctrl_portid.h

diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 2d7b652..d849e8f 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
 if ENABLE_TIPC_TRANSPORT
 noinst_HEADERS += src/mds/mds_dt_tipc.h \
src/mds/mds_tipc_recvq_stats.h \
-   src/mds/mds_tipc_recvq_stats_impl.h
+   src/mds/mds_tipc_recvq_stats_impl.h \
+   src/mds/mds_tipc_fctrl_intf.h \
+   src/mds/mds_tipc_fctrl_portid.h \
+   src/mds/mds_tipc_fctrl_msg.h
 lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
src/mds/mds_tipc_recvq_stats.cc \
-   src/mds/mds_tipc_recvq_stats_impl.cc
+   src/mds/mds_tipc_recvq_stats_impl.cc \
+   src/mds/mds_tipc_fctrl_intf.cc \
+   src/mds/mds_tipc_fctrl_portid.cc \
+   src/mds/mds_tipc_fctrl_msg.cc
 endif
 
 if ENABLE_TESTS
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index b645bb4..d9e8633 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL ref);
 uint32_t mds_tmr_mailbox_processing(void);
 uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL *svc_hdl);
 uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t seq_num,
-   uint16_t frag_byte);
+   uint16_t frag_byte, uint16_t fctrl_seq_num);
 uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
 uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, uint64_t tipc_id,
 uint32_t *buff_dump);
@@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 
 #define MDS_PROT 0xA0
 #define MDS_VERSION 0x08
-#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
+#define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* MDS protocol/version for flow control */
+#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
+#define MDS_PROT_FCTRL_ID 0x00AC13F5
+
 /* Added for the subscription changes */
 #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
 #define MDS_TIPC_COMMON_ID 0x01001000
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 86b52bb..fef1c50 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -47,6 +47,7 @@
 #include "mds_dt_tipc.h"
 #include "mds_dt_tcp_disc.h"
 #include "mds_core.h"
+#include "mds_tipc_fctrl_intf.h"
 #include "mds_tipc_recvq_stats.h"
 #include "base/osaf_utility.h"
 #include "base/osaf_poll.h"
@@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
+uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
 
-static bool get_tipc_port_id(int sock, uint32_t* port_id) {
+static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
struct sockaddr_tipc addr;
socklen_t sz = sizeof(addr);
 
memset(, 0, sizeof(addr));
-   *port_id = 0;
+   port_id->node = 0;
+   port_id->ref = 0;
if (0 > getsockname(sock, (struct sockaddr *), )) {
syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: %s",
   strerror(errno));
return false;
}
 
-   *port_id = addr.addr.id.ref;
+   *port_id = addr.addr.id;
return true;
 }
 
@@ -240,12 +243,13 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
 
/* Code for getting the self tipc random number */
-   if (!get_tipc_port_id(tipc_cb.BSRsock, mds_tipc_r

[devel] [PATCH 5/9] mds: Add state machine for tipc portid instance [#1960]

2019-08-14 Thread Minh Chau

This patch adds state machine to support tx probation timer.
---
 src/mds/mds_tipc_fctrl_intf.cc   |  47 +++--
 src/mds/mds_tipc_fctrl_msg.h |   1 +
 src/mds/mds_tipc_fctrl_portid.cc | 109 +++
 src/mds/mds_tipc_fctrl_portid.h  |  22 
 4 files changed, 176 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index bd0a8f6..c2d0922 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -34,6 +34,7 @@
 
 using mds::Event;
 using mds::TipcPortId;
+using mds::Timer;
 using mds::DataMessage;
 using mds::ChunkAck;
 using mds::HeaderMessage;
@@ -65,6 +66,11 @@ uint64_t sock_buf_size = 0;
 std::map portid_map;
 std::mutex portid_map_mutex;
 
+// probation timer event to enable flow control at receivers
+const int64_t kBaseTimerInt = 200;  // in centisecond
+const uint8_t kTxProbMaxRetries = 10;
+Timer txprob_timer(Event::Type::kEvtTmrTxProb);
+
 // chunk ack parameters
 // todo: The chunk ack timeout and chunk ack size should be configurable
 int kChunkAckTimeout = 1000;  // in miliseconds
@@ -76,13 +82,37 @@ TipcPortId* portid_lookup(struct tipc_portid id) {
   return portid_map[uid];
 }
 
+void tmr_exp_cbk(void* uarg) {
+  Timer* timer = reinterpret_cast(uarg);
+  if (timer != nullptr) {
+timer->is_active_ = false;
+// send to fctrl thread
+if (m_NCS_IPC_SEND(_events, new Event(timer->type_),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events\n");
+}
+  }
+}
+
 void process_timer_event(const Event evt) {
+  bool txprob_restart = false;
   for (auto i : portid_map) {
 TipcPortId* portid = i.second;
+
+if (evt.type_ == Event::Type::kEvtTmrTxProb) {
+  if (portid->ReceiveTmrTxProb(kTxProbMaxRetries) == true) {
+txprob_restart = true;
+  }
+}
+
 if (evt.type_ == Event::Type::kEvtTmrChunkAck) {
   portid->ReceiveTmrChunkAck();
 }
   }
+  if (txprob_restart) {
+txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
+m_MDS_LOG_DBG("FCTRL: Restart txprob");
+  }
 }
 
 uint32_t process_flow_event(const Event evt) {
@@ -231,8 +261,10 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct 
tipc_portid id, uint16_t len,
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
-// assign the sequence number of the outgoing message
-*next_seq = portid->GetCurrentSeq();
+if (portid->state_ != TipcPortId::State::kDisabled) {
+  // assign the sequence number of the outgoing message
+  *next_seq = portid->GetCurrentSeq();
+}
   }
 
   portid_map_mutex.unlock();
@@ -252,7 +284,16 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 id.node, id.ref, __LINE__);
 rc = NCSCC_RC_FAILURE;
   } else {
-portid->Queue(buffer, len);
+if (portid->state_ != TipcPortId::State::kDisabled) {
+  portid->Queue(buffer, len);
+  // start txprob timer for the first msg sent out
+  // do not start for other states
+  if (portid->state_ == TipcPortId::State::kStartup) {
+txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
+m_MDS_LOG_DBG("FCTRL: Start txprob");
+portid->state_ = TipcPortId::State::kTxProb;
+  }
+}
   }
 
   portid_map_mutex.unlock();
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 8e6a874..69f8048 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -45,6 +45,7 @@ class Event {
 kEvtDropData,  // event reported from tipc that a message is not
// delivered
 kEvtTmrAll,
+kEvtTmrTxProb,// event that tx probation timer expired for once
 kEvtTmrChunkAck,  // event to send the chunk ack
   };
   NCS_IPC_MSG next_{0};
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 64115d5..84ecee9 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -23,6 +23,35 @@
 
 namespace mds {
 
+Timer::Timer(Event::Type type) {
+  tmr_id_ = nullptr;
+  type_ = type;
+  is_active_ = false;
+}
+
+Timer::~Timer() {
+}
+
+void Timer::Start(int64_t period, void (*tmr_exp_func)(void*)) {
+  // timer will not start if it's already started
+  // period is in centiseconds
+  if (is_active_ == false) {
+if (tmr_id_ == nullptr) {
+  tmr_id_ = ncs_tmr_alloc(nullptr, 0);
+}
+tmr_id_ = ncs_tmr_start(tmr_id_, period, tmr_exp_func, this,
+nullptr, 0);
+is_active_ = true;
+  }
+}
+
+void Timer::Stop() {
+  if (is_active_ == true) {
+ncs_tmr_stop(tmr_id_);
+is_active_ = false;
+  }
+}
+
 void MessageQueue::Queue(DataMessage* msg) {
   queue_.push_back(msg);
 }
@@ -64,6 +93,7 @@ void MessageQueue::Clear() {
 TipcPortId::TipcPortId(struct tipc_portid id, int sock, uint16_t chksize,
 uint64_t sock_buf_size):
   id_(id), bsrsock_(sock), chunk_size_(chksize),

[devel] [PATCH 1/9] mds: Add README for solution of TIPC buffer overflow at MDS [#1960]

2019-08-14 Thread Minh Chau

---
 src/mds/README | 221 +
 1 file changed, 221 insertions(+)
 create mode 100644 src/mds/README

diff --git a/src/mds/README b/src/mds/README
new file mode 100644
index 000..1b94632
--- /dev/null
+++ b/src/mds/README
@@ -0,0 +1,221 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+Background
+==
+If OpenSAF configures TIPC as transport, the MDS library today will use
+TIPC SOCK_RDM socket for message distribution in the cluster. The SOCK_RDM
+datagram socket possibly encounters buffer overflow at receiver ends which
+has been documented in tipc.io[1]. A temporary solution for this buffer
+overflow issue is that the socket buffer size can be increased to a larger
+number. However, if the cluster continues either scaling out or adding more
+components, the system will be under dimensioned, thus the TIPC buffer
+overflow can occur again.
+
+MDS's solution for TIPC buffer overflow
+===
+If MDS disables TIPC_DEST_DROPPABLE, TIPC will return the ancillary message
+when the original message is failed to deliver. By this event, if the message
+has been saved in queue, MDS at sender sides can search and retransmit this
+message to the receivers.
+Once the messages in the sender's queue has been delivered successfully, MDS
+needs to remove them. MDS introduces its internal ACK message as an
+acknowledgment from receivers so that the senders can remove the messages
+out of the queue.
+Also, as such situation of buffer overflow at receivers, the retransmission may
+not succeed or even become worse at receiver ends (the more retransmission,
+the more overflow to occur). MDS imitates the sliding window in TCP[2] to
+control the flow of data message towards the receivers.
+
+Legacy MDS data message, new (data + ACK) MDS message, and upgradability
+
+Below is the MDS legacy message format that has been used till OpenSAF 5.19.07
+
+oct 0  message length
+oct 1
+--
+oct 2  sequence number: incremented for every message sent out to all destined
+...   tipc portid.
+oct 5
+--
+oct 6  fragment number: a message with same sequence number can be fragmented,
+oct 7  identified by this fragment number.
+--
+oct 8  length check: cross check with message length(oct0,1), NOT USED.
+oct 9
+--
+oct 10 protocol version: (MDS_PROT:0xA0 | MDS_VERSION:0x08) = 0xA8, NOT USED
+--
+oct 11 mds length: length of mds header and mds data, starting from oct13
+oct 12
+--
+oct 13 mds header and data
+...
+--
+
+The current sequence number/fragment number are being used in MDS for all
+messages sent to all discovered tipc portid(s), meaning that every message is 
sent
+to any tipc portid, the sequence/fragment number is increased. The flow control
+needs its own sequence number sliding between two tipc porid(s) so that 
receivers
+can detect message drop due to buffer overload. Therefore, the oct8 and oct9 
are
+now reused as flow control sequence number. The oct10, protocol version, has 
new
+value of 0xB8. The format of new data message as below:
+
+oct 0  same
+...
+oct 7
+--
+oct 8  flow control sequence number
+oct 9
+--
+oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
+--
+oct 11 same
+...
+--
+
+The ACK message is introduced to acknowledge one data message or a chunk of
+accumulative data message. The ACK message format:
+
+oct 0  message length
+oct 1
+--
+oct 2  8 bytes, NOT USED
+
+oct 9
+--
+oct 10 protocol version: (MDS_PROT_TIPC_FCTRL:0xB0 | MDS_VERSION:0x08) = 0xB8
+--
+oct 11 protocol identifier: MDS_PROT_FCTRL_ID
+..
+oct 14
+--
+oct 15 flow control message type: CHUNKACK
+--
+oct 16 service id: service id

[devel] [PATCH 2/9] mds: Resolve c/c++ linking issue [#1960]

2019-08-14 Thread Minh Chau

From: Tran Thuan 

This patch solves the linking issue if mds_dt.h or mds_core.h
is included in c++ sources.
---
 src/mds/mds_core.h| 74 +++
 src/mds/mds_dt.h  |  4 +--
 src/mds/mds_dt2c.h| 67 --
 src/mds/mds_dt_tcp.c  |  2 ++
 src/mds/mds_dt_tcp.h  |  1 -
 src/mds/mds_dt_tipc.c |  2 ++
 6 files changed, 80 insertions(+), 70 deletions(-)

diff --git a/src/mds/mds_core.h b/src/mds/mds_core.h
index 37696d4..c09b428 100644
--- a/src/mds/mds_core.h
+++ b/src/mds/mds_core.h
@@ -573,6 +573,80 @@ extern uint32_t mds_mcm_free_msg_uba_start(MDS_ENCODED_MSG 
msg);
 extern void get_adest_details(MDS_DEST adest, char *adest_details);
 extern void get_subtn_adest_details(MDS_PWE_HDL pwe_hdl, MDS_SVC_ID svc_id,
 MDS_DEST adest, char *adest_details);
+#ifdef __cplusplus
+extern "C" {
+#endif
+/*  */
+/*  */
+/*MCM to MDTM   */
+/*  */
+/*  */
+
+/* Initialization of MDTM Module */
+uint32_t (*mds_mdtm_init)(NODE_ID node_id, uint32_t *mds_tipc_ref);
+
+/* Destroying the MDTM Module*/
+uint32_t (*mds_mdtm_destroy)(void);
+
+uint32_t (*mds_mdtm_send)(MDTM_SEND_REQ *req);
+
+/* SVC Install */
+uint32_t (*mds_mdtm_svc_install)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+ NCSMDS_SCOPE_TYPE install_scope,
+ V_DEST_RL role, MDS_VDEST_ID vdest_id,
+ NCS_VDEST_TYPE vdest_policy,
+ MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
+
+/* SVC Uninstall */
+uint32_t (*mds_mdtm_svc_uninstall)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+   NCSMDS_SCOPE_TYPE install_scope,
+   V_DEST_RL role, MDS_VDEST_ID vdest_id,
+   NCS_VDEST_TYPE vdest_policy,
+   MDS_SVC_PVT_SUB_PART_VER mds_svc_pvt_ver);
+
+/* SVC Subscribe */
+uint32_t (*mds_mdtm_svc_subscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+   NCSMDS_SCOPE_TYPE subscribe_scope,
+   MDS_SVC_HDL local_svc_hdl,
+   MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/*  added svc_hdl */
+/* SVC Unsubscribe */
+uint32_t (*mds_mdtm_svc_unsubscribe)(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id,
+ NCSMDS_SCOPE_TYPE subscribe_scope,
+ MDS_SUBTN_REF_VAL subtn_ref_val);
+
+/* VDEST Install */
+uint32_t (*mds_mdtm_vdest_install)(MDS_VDEST_ID vdest_id);
+
+/* VDEST Uninstall */
+uint32_t (*mds_mdtm_vdest_uninstall)(MDS_VDEST_ID vdest_id);
+
+/* VDEST Subscribe */
+uint32_t (*mds_mdtm_vdest_subscribe)(MDS_VDEST_ID vdest_id,
+ MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/* VDEST Unsubscribe */
+uint32_t (*mds_mdtm_vdest_unsubscribe)(MDS_VDEST_ID vdest_id,
+   MDS_SUBTN_REF_VAL subtn_ref_val);
+
+/* Tx Register (For incrementing the use count) */
+uint32_t (*mds_mdtm_tx_hdl_register)(MDS_DEST adest);
+
+/* Tx Unregister (For decrementing the use count) */
+uint32_t (*mds_mdtm_tx_hdl_unregister)(MDS_DEST adest);
+
+/* Node subscription */
+uint32_t (*mds_mdtm_node_subscribe)(MDS_SVC_HDL svc_hdl,
+MDS_SUBTN_REF_VAL *subtn_ref_val);
+
+/* Node unsubscription */
+uint32_t (*mds_mdtm_node_unsubscribe)(MDS_SUBTN_REF_VAL subtn_ref_val);
+
+#ifdef __cplusplus
+}
+#endif
+
 /*  */
 /*  */
 /* MMGR Macros  */
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index a6e2801..b645bb4 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -214,10 +214,10 @@ typedef struct mdtm_ref_hdl_list {
   MDS_SVC_HDL svc_hdl;
 } MDTM_REF_HDL_LIST;
 
-MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
+extern MDTM_REF_HDL_LIST *mdtm_ref_hdl_list_hdr;
+extern NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_attach_mbx(SYSF_MBX mbx);
 void mds_buff_dump(uint8_t *buff, uint32_t len, uint32_t max);
-NCS_PATRICIA_TREE mdtm_reassembly_list;
 
 uint32_t mdtm_set_transport(MDTM_TX_TYPE transport);
 bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT msg);
diff --git a/src/mds/mds_dt2c.h b/src/mds/mds_dt2c.h
index 012999c..c92fecb 100644
--- a/src/mds/mds_dt2c.h
+++ b/src/mds/mds_dt2c.h
@@ -322,73 +322,6 @@ extern uint32_t gl_mds_checksum;
 
 /*  */
 /*  */
-/*MCM to MDTM   */
-/*  */
-/*  */
-
-/*

[devel] [PATCH 9/9] mds: Add TIPC buffer overflow for mdstest [#1960]

2019-08-14 Thread Minh Chau

From: Tran Thuan 

---
 src/mds/apitest/mdstest.c  |   5 +-
 src/mds/apitest/mdstipc.h  |   6 +-
 src/mds/apitest/mdstipc_api.c  | 237 +
 src/mds/apitest/mdstipc_conf.c |  19 ++--
 4 files changed, 253 insertions(+), 14 deletions(-)

diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c
index bf6e173..a381aad 100644
--- a/src/mds/apitest/mdstest.c
+++ b/src/mds/apitest/mdstest.c
@@ -84,12 +84,13 @@ int main(int argc, char **argv)
return 0;
}
 
-   if (mds_startup() != 0) {
+   if ((suite != 27) && (mds_startup() != 0)) {
printf("Fail to start mds agents\n");
return 1;
}
 
int rc = test_run(suite, tcase);
-   mds_shutdown();
+   if (suite != 27)
+   mds_shutdown();
return rc;
 }
diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index 07cf556..2bd44b4 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -20,6 +20,7 @@
 
 #include "base/ncssysf_tsk.h"
 #include "base/ncssysf_def.h"
+#include "base/ncsencdec_pub.h"
 
 typedef struct tet_task {
   NCS_OS_CB entry;
@@ -163,9 +164,6 @@ int gl_DEC_FLAT_CB_FAIL;
 int gl_RECEIVE_CB_FAIL;
 int gl_COPY_CB_FAIL;
 
-uint32_t ncs_encode_16bit(uint8_t **, uint32_t);
-uint16_t ncs_decode_16bit(uint8_t **);
-
 uint32_t tet_mds_svc_callback(NCSMDS_CALLBACK_INFO *);
 /**MDS call back routines */
 uint32_t tet_mds_cb_cpy(NCSMDS_CALLBACK_INFO *);
@@ -528,5 +526,7 @@ uint32_t mds_send_get_redack(MDS_HDL mds_hdl, MDS_SVC_ID 
svc_id,
 uint32_t mds_send_redrsp_getack(MDS_HDL mds_hdl, MDS_SVC_ID svc_id,
 int64_t time_to_wait, TET_MDS_MSG *response);
 uint32_t tet_sync_point(void);
+int mds_startup(void);
+int mds_shutdown(void);
 
 #endif  // MDS_APITEST_MDSTIPC_H_
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index b5547e3..8057284 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -27,6 +27,7 @@
 #include "base/ncs_main_papi.h"
 #include "mdstipc.h"
 #include "base/ncssysf_tmr.h"
+#include "base/osaf_poll.h"
 
 #define MSG_SIZE MDS_DIRECT_BUF_MAXSIZE
 static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
@@ -13104,6 +13105,234 @@ void tet_create_default_PWE_VDEST_tp()
test_validate(FAIL, 0);
 }
 
+void tet_sender(char *send_buff, uint32_t buff_len, int msg_count)
+{
+   int live = 100; // sender live max 100s
+   TET_MDS_MSG *mesg;
+   mesg = (TET_MDS_MSG *)malloc(sizeof(TET_MDS_MSG));
+   memset(mesg, 0, sizeof(TET_MDS_MSG));
+
+   printf("\nStarted Sender (pid:%d) svc_id=%d\n",
+   (int)getpid(), NCSMDS_SVC_ID_INTERNAL_MIN);
+   if (adest_get_handle() != NCSCC_RC_SUCCESS) {
+   printf("\n: Sender FAIL to get adest handle\n");
+   exit(1);
+   }
+
+   if (mds_service_install(gl_tet_adest.mds_pwe1_hdl,
+   NCSMDS_SVC_ID_INTERNAL_MIN, 1,
+   NCSMDS_SCOPE_NONE, false, false) != 
NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL to install the service\n");
+   exit(1);
+   }
+
+   MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   if (mds_service_subscribe(
+   gl_tet_adest.mds_pwe1_hdl, NCSMDS_SVC_ID_INTERNAL_MIN,
+   NCSMDS_SCOPE_INTRANODE, 1, svcids) != NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL to subscribe receiver\n");
+   exit(1);
+   }
+
+   while(!gl_tet_adest.svc[0].svcevt[0].dest && live-- > 0) {
+   printf("\nSender is waiting for receiver UP\n");
+   sleep(1);
+   }
+   printf("\nSender got last event=%d of svc_id=%d dest=<%llx>\n",
+   gl_tet_adest.svc[0].svcevt[0].event,
+   gl_tet_adest.svc[0].svcevt[0].svc_id,
+   gl_tet_adest.svc[0].svcevt[0].dest);
+
+   // wait for receiver subscribe sender
+   // otherwise, receiver won't detect loss message
+   sleep(1);
+
+   uint32_t offset = 0;
+   uint32_t msg_len = buff_len / msg_count;
+   for (int i = 1; i <= msg_count; i++) {
+   memcpy(mesg->send_data, _buff[offset], msg_len);
+   mesg->send_len = msg_len;
+   if (mds_just_send(gl_tet_adest.mds_pwe1_hdl,
+ NCSMDS_SVC_ID_INTERNAL_MIN,
+ NCSMDS_SVC_ID_EXTERNAL_MIN,
+ gl_tet_adest.svc[0].svcevt[0].dest,
+ MDS_SEND_PRIORITY_HIGH,
+ mesg) != NCSCC_RC_SUCCESS) {
+   printf("\nSender FAIL send message\n");
+   exit(1);
+   } else {
+   printf("\nSender SENT message %d successfully\n", i);
+   }
+   offset

[devel] [PATCH 6/9] mds: Implement kRcvBuffOverflow state [#1960]

2019-08-14 Thread Minh Chau

This patch implements the kRcvBuffOverflow state machine as
described in README file.
---
 src/mds/mds_tipc_fctrl_intf.cc   |   6 +-
 src/mds/mds_tipc_fctrl_msg.h |   1 +
 src/mds/mds_tipc_fctrl_portid.cc | 137 ++-
 src/mds/mds_tipc_fctrl_portid.h  |   5 +-
 4 files changed, 131 insertions(+), 18 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index c2d0922..397114e 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -285,14 +285,16 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 rc = NCSCC_RC_FAILURE;
   } else {
 if (portid->state_ != TipcPortId::State::kDisabled) {
-  portid->Queue(buffer, len);
+  bool sendable = portid->ReceiveCapable(len);
+  portid->Queue(buffer, len, sendable);
   // start txprob timer for the first msg sent out
   // do not start for other states
-  if (portid->state_ == TipcPortId::State::kStartup) {
+  if (sendable && portid->state_ == TipcPortId::State::kStartup) {
 txprob_timer.Start(kBaseTimerInt, tmr_exp_cbk);
 m_MDS_LOG_DBG("FCTRL: Start txprob");
 portid->state_ = TipcPortId::State::kTxProb;
   }
+  if (sendable == false) rc = NCSCC_RC_FAILURE;
 }
   }
 
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 69f8048..e6b9662 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -110,6 +110,7 @@ class DataMessage: public BaseMessage {
   uint8_t* msg_data_{nullptr};
   uint8_t snd_type_{0};
 
+  bool is_sent_{true};
   DataMessage() {}
   virtual ~DataMessage();
   void Decode(uint8_t *msg) override;
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 84ecee9..e762290 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -82,6 +82,23 @@ uint64_t MessageQueue::Erase(uint16_t fseq_from, uint16_t 
fseq_to) {
   return msg_len;
 }
 
+DataMessage* MessageQueue::FirstUnsent() {
+  for (auto it = queue_.begin(); it != queue_.end(); ++it) {
+DataMessage *m = *it;
+if (m->is_sent_ == false) {
+  return m;
+}
+  }
+  return nullptr;
+}
+
+void MessageQueue::MarkUnsentFrom(uint16_t fseq) {
+  for (auto it = queue_.begin(); it != queue_.end(); ++it) {
+DataMessage *m = *it;
+if (m->header_.fseq_ >= fseq) m->is_sent_ = false;
+  }
+}
+
 void MessageQueue::Clear() {
   while (queue_.empty() == false) {
 DataMessage* msg = queue_.front();
@@ -99,7 +116,8 @@ TipcPortId::TipcPortId(struct tipc_portid id, int sock, 
uint16_t chksize,
 TipcPortId::~TipcPortId() {
   // Fake a TmrChunkAck event to ack all received messages
   ReceiveTmrChunkAck();
-  // clear all msg in sndqueue_
+  // flush all unsent msg in sndqueue_
+  FlushData();
   sndqueue_.Clear();
 }
 
@@ -109,6 +127,24 @@ uint64_t TipcPortId::GetUniqueId(struct tipc_portid id) {
   return uid;
 }
 
+void TipcPortId::FlushData() {
+  DataMessage* msg = nullptr;
+  do {
+// find the lowest sequence unsent yet
+msg = sndqueue_.FirstUnsent();
+if (msg != nullptr) {
+  Send(msg->msg_data_, msg->header_.msg_len_);
+  msg->is_sent_ = true;
+  m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
+  "FlushData[mseq:%u, mfrag:%u, fseq:%u], "
+  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
+  id_.node, id_.ref,
+  msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_,
+  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+}
+  } while (msg != nullptr);
+}
+
 uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) {
   struct sockaddr_tipc server_addr;
   ssize_t send_len = 0;
@@ -130,29 +166,49 @@ uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) 
{
   return rc;
 }
 
-uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length) {
+uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length,
+bool is_sent) {
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   DataMessage *msg = new DataMessage;
   msg->header_.Decode(const_cast(data));
   msg->Decode(const_cast(data));
   msg->msg_data_ = new uint8_t[length];
+  msg->is_sent_ = is_sent;
   memcpy(msg->msg_data_, data, length);
   sndqueue_.Queue(msg);
-  ++sndwnd_.send_;
-  sndwnd_.nacked_space_ += length;
-  m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
-  "SndData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
-  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
-  id_.node, id_.ref,
-  msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
-
+  if (is_sent) {
+++sndwnd_.send_;
+sndwnd_.nacked_space_ += length;
+m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
+"SndData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
+"sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
+id_.node, id_.ref,
+msg->header_.mseq_,

[devel] [PATCH 7/9] mds: Add configurable parameters [#1960]

2019-08-14 Thread Minh Chau

This patch makes the solution of TIPC buffer overflow configurable,
as well as the ack timeout/ack size.
For example:
The service config file can export the following environment variables

export MDS_TIPC_FCTRL_ENABLED=1
export MDS_TIPC_FCTRL_ACKTIMEOUT=1000
export MDS_TIPC_FCTRL_ACKSIZE=1

If MDS_TIPC_FCTRL_ACKTIMEOUT, MDS_TIPC_FCTRL_ACKSIZE are not specified,
the default values are used.
---
 src/mds/mds_dt_tipc.c  | 19 ---
 src/mds/mds_tipc_fctrl_intf.cc | 25 +++--
 src/mds/mds_tipc_fctrl_intf.h  |  3 ++-
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fef1c50..1b6c3f8 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -342,9 +342,22 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
}
 
/* Create flow control tasks if enabled*/
-   gl_mds_pro_ver = MDS_PROT_FCTRL;
-   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id,
-   (uint64_t)optval, tipc_mcast_enabled);
+   char* ptr;
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ENABLED")) != NULL) {
+   if (atoi(ptr) == 1) {
+   gl_mds_pro_ver = MDS_PROT_FCTRL;
+   int ackto = -1;
+   int acksize = -1;
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ACKTIMEOUT")) != 
NULL) {
+   ackto = atoi(ptr);
+   }
+   if ((ptr = getenv("MDS_TIPC_FCTRL_ACKSIZE")) != NULL) {
+   acksize = atoi(ptr);
+   }
+   mds_tipc_fctrl_initialize(tipc_cb.BSRsock, port_id, 
(uint64_t)optval,
+   ackto, acksize, tipc_mcast_enabled);
+   }
+   }
 
/* Create a task to receive the events and data */
if (mdtm_create_rcv_task(tipc_cb.hdle_mdtm) != NCSCC_RC_SUCCESS) {
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 397114e..8949937 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -40,6 +40,9 @@ using mds::ChunkAck;
 using mds::HeaderMessage;
 
 namespace {
+// flow control enabled/disabled
+bool is_fctrl_enabled = false;
+
 // multicast/broadcast enabled
 // todo: to be removed if flow control support it
 bool is_mcast_enabled = true;
@@ -225,7 +228,8 @@ uint32_t create_ncs_task(void *task_hdl) {
 }  // end local namespace
 
 uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct tipc_portid id,
-uint64_t rcv_buf_size, bool mcast_enabled) {
+uint64_t rcv_buf_size, int32_t ackto, int32_t acksize,
+bool mcast_enabled) {
   if (create_ncs_task(_task_hdl) !=
   NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Start of the Created Task-failed:\n");
@@ -234,8 +238,10 @@ uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct 
tipc_portid id,
   data_sock_fd = dgramsock;
   snd_rcv_portid = id;
   sock_buf_size = rcv_buf_size;
+  is_fctrl_enabled = true;
   is_mcast_enabled = mcast_enabled;
-
+  if (ackto != -1) kChunkAckTimeout = ackto;
+  if (acksize != -1) kChunkAckSize = acksize;
   m_MDS_LOG_NOTIFY("FCTRL: Initialize [node:%x, ref:%u]",
   id.node, id.ref);
 
@@ -243,6 +249,7 @@ uint32_t mds_tipc_fctrl_initialize(int dgramsock, struct 
tipc_portid id,
 }
 
 uint32_t mds_tipc_fctrl_shutdown(void) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
   if (ncs_task_release(p_task_hdl) != NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Stop of the Created Task-failed:\n");
   }
@@ -251,6 +258,8 @@ uint32_t mds_tipc_fctrl_shutdown(void) {
 
 uint32_t mds_tipc_fctrl_sndqueue_capable(struct tipc_portid id, uint16_t len,
   uint16_t* next_seq) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   portid_map_mutex.lock();
@@ -274,6 +283,8 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct tipc_portid 
id, uint16_t len,
 
 uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, uint16_t len,
 struct tipc_portid id) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   portid_map_mutex.lock();
@@ -304,6 +315,8 @@ uint32_t mds_tipc_fctrl_trysend(const uint8_t *buffer, 
uint16_t len,
 }
 
 uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, uint32_t type) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   MDS_SVC_ID svc_id = (uint16_t)(type & MDS_EVENT_MASK_FOR_SVCID);
 
   portid_map_mutex.lock();
@@ -328,6 +341,8 @@ uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, 
uint32_t type) {
 }
 
 uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, uint32_t type) {
+  if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
+
   MDS_SVC_ID svc_id = (uint16_t)(type & MDS_EVENT_MASK_FOR_SVCID);
 
   portid_map_mutex.lock();
@@ -345,6 +360,8 @@ uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, 
uint32_t type) {
 }

[devel] [PATCH 8/9] mds: Apply serial number arithmetic for sequence counter [#1960]

2019-08-14 Thread Minh Chau

This patch applies the serial number arithmetic for the flow control
sequence number, referenced to RFC1982.

This is only temporary patch, a proper one could be made in /base
with template for others type, e.g uint32. Then mds reuses it from
/base.
---
 src/mds/mds_tipc_fctrl_portid.cc | 53 +--
 src/mds/mds_tipc_fctrl_portid.h  | 77 
 2 files changed, 97 insertions(+), 33 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index e762290..365d72f 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -66,12 +66,12 @@ DataMessage* MessageQueue::Find(uint32_t mseq, uint16_t 
mfrag) {
   return nullptr;
 }
 
-uint64_t MessageQueue::Erase(uint16_t fseq_from, uint16_t fseq_to) {
+uint64_t MessageQueue::Erase(Seq16 fseq_from, Seq16 fseq_to) {
   uint64_t msg_len = 0;
   for (auto it = queue_.begin(); it != queue_.end();) {
 DataMessage *m = *it;
-if (fseq_from <= m->header_.fseq_ &&
-m->header_.fseq_ <= fseq_to) {
+if (fseq_from <= Seq16(m->header_.fseq_) &&
+Seq16(m->header_.fseq_) <= fseq_to) {
   msg_len += m->header_.msg_len_;
   it = queue_.erase(it);
   delete m;
@@ -92,10 +92,10 @@ DataMessage* MessageQueue::FirstUnsent() {
   return nullptr;
 }
 
-void MessageQueue::MarkUnsentFrom(uint16_t fseq) {
+void MessageQueue::MarkUnsentFrom(Seq16 fseq) {
   for (auto it = queue_.begin(); it != queue_.end(); ++it) {
 DataMessage *m = *it;
-if (m->header_.fseq_ >= fseq) m->is_sent_ = false;
+if (Seq16(m->header_.fseq_) >= fseq) m->is_sent_ = false;
   }
 }
 
@@ -140,7 +140,7 @@ void TipcPortId::FlushData() {
   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
   id_.node, id_.ref,
   msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_,
-  sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+  sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
 }
   } while (msg != nullptr);
 }
@@ -185,7 +185,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
length,
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
   } else {
 ++sndwnd_.send_;
 m_MDS_LOG_DBG("FCTRL: [me] --> [node:%x, ref:%u], "
@@ -193,7 +193,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
length,
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 msg->header_.mseq_, msg->header_.mfrag_, msg->header_.fseq_, length,
-sndwnd_.acked_, sndwnd_.send_, sndwnd_.nacked_space_);
+sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
   }
   return rc;
 }
@@ -248,13 +248,13 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 txprob_cnt_, (uint8_t)state_);
   }
   // update receiver sequence window
-  if (rcvwnd_.acked_ < fseq && rcvwnd_.rcv_ + 1 == fseq) {
+  if (rcvwnd_.acked_ < Seq16(fseq) && rcvwnd_.rcv_ + Seq16(1) == Seq16(fseq)) {
 m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
 "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
 
 ++rcvwnd_.rcv_;
 if (rcvwnd_.rcv_ - rcvwnd_.acked_ >= chunk_size_) {
@@ -279,7 +279,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 // It is not used for now, so ignore it.
 
 // check for transmission error
-if (rcvwnd_.rcv_ + 1 < fseq) {
+if (rcvwnd_.rcv_ + Seq16(1) < Seq16(fseq)) {
   if (rcvwnd_.rcv_ == 0 && rcvwnd_.acked_ == 0) {
 // peer does not realize that this portid reset
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
@@ -288,7 +288,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 "Warning[portid reset]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
 
 rcvwnd_.rcv_ = fseq;
   } else {
@@ -300,10 +300,10 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 "Error[msg loss]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
-rcvwnd_.acked_, rcvwnd_.rcv_, rcvwnd_.nacked_space_);
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
   }
 }
-if (fseq <= rcvwnd_.acked_) {
+if (Seq16(fseq) <= rcvwnd_.acked_) {
   rc = NCSCC_RC_FAILURE;
   // unexpected retransmission

[devel] [PATCH 4/9] mds: Add timeout for ack message [#1960]

2019-08-14 Thread Minh Chau

If the ack size is configured greater than 1, there should be a timeout
at receiver ends to send the ack message back to senders.
The ack message timeout utilizes the poll timeout in flow control thread
to make mds lightweight (in contrast to additional timer threads).
---
 src/mds/mds_tipc_fctrl_intf.cc   | 33 ++---
 src/mds/mds_tipc_fctrl_msg.h |  6 ++
 src/mds/mds_tipc_fctrl_portid.cc | 15 +++
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 91b9107..bd0a8f6 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -66,7 +66,8 @@ std::map portid_map;
 std::mutex portid_map_mutex;
 
 // chunk ack parameters
-// todo: The chunk ack size should be configurable
+// todo: The chunk ack timeout and chunk ack size should be configurable
+int kChunkAckTimeout = 1000;  // in miliseconds
 uint16_t kChunkAckSize = 3;
 
 TipcPortId* portid_lookup(struct tipc_portid id) {
@@ -75,6 +76,15 @@ TipcPortId* portid_lookup(struct tipc_portid id) {
   return portid_map[uid];
 }
 
+void process_timer_event(const Event evt) {
+  for (auto i : portid_map) {
+TipcPortId* portid = i.second;
+if (evt.type_ == Event::Type::kEvtTmrChunkAck) {
+  portid->ReceiveTmrChunkAck();
+}
+  }
+}
+
 uint32_t process_flow_event(const Event evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   TipcPortId *portid = portid_lookup(evt.id_);
@@ -110,7 +120,7 @@ uint32_t process_flow_event(const Event evt) {
 uint32_t process_all_events(void) {
   enum { FD_FCTRL = 0, NUM_FDS };
 
-  int poll_tmo = MDTM_TIPC_POLL_TIMEOUT;
+  int poll_tmo = kChunkAckTimeout;
   while (true) {
 int pollres;
 struct pollfd pfd[NUM_FDS] = {{0}};
@@ -135,11 +145,24 @@ uint32_t process_all_events(void) {
 if (evt == nullptr) continue;
 
 portid_map_mutex.lock();
-process_flow_event(*evt);
+
+if (evt->IsTimerEvent()) {
+  process_timer_event(*evt);
+}
+if (evt->IsFlowEvent()) {
+  process_flow_event(*evt);
+}
+
 delete evt;
 portid_map_mutex.unlock();
   }
 }
+// timeout, scan all portid and send ack msgs
+if (pollres == 0) {
+  portid_map_mutex.lock();
+  process_timer_event(Event(Event::Type::kEvtTmrChunkAck));
+  portid_map_mutex.unlock();
+}
   }  /* while */
   return NCSCC_RC_SUCCESS;
 }
@@ -368,6 +391,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
   portid_map_mutex.lock();
   uint32_t rc = process_flow_event(Event(Event::Type::kEvtRcvData,
   id, data.svc_id_, header.mseq_, header.mfrag_, header.fseq_));
+  if (rc == NCSCC_RC_CONTINUE) {
+process_timer_event(Event(Event::Type::kEvtTmrChunkAck));
+rc = NCSCC_RC_SUCCESS;
+  }
   portid_map_mutex.unlock();
   return rc;
 }
diff --git a/src/mds/mds_tipc_fctrl_msg.h b/src/mds/mds_tipc_fctrl_msg.h
index 677f256..8e6a874 100644
--- a/src/mds/mds_tipc_fctrl_msg.h
+++ b/src/mds/mds_tipc_fctrl_msg.h
@@ -44,6 +44,8 @@ class Event {
// selective data msgs (not supported)
 kEvtDropData,  // event reported from tipc that a message is not
// delivered
+kEvtTmrAll,
+kEvtTmrChunkAck,  // event to send the chunk ack
   };
   NCS_IPC_MSG next_{0};
   Type type_;
@@ -68,6 +70,10 @@ class Event {
 fseq_(f_seg_num), chunk_size_(chunk_size) {
 type_ = type;
   }
+  bool IsTimerEvent() { return (type_ > Type::kEvtTmrAll); }
+  bool IsFlowEvent() {
+return (Type::kEvtDataFlowAll < type_ && type_ < Type::kEvtTmrAll);
+  }
 };
 
 class BaseMessage {
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 24d13ee..64115d5 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -67,6 +67,8 @@ TipcPortId::TipcPortId(struct tipc_portid id, int sock, 
uint16_t chksize,
 }
 
 TipcPortId::~TipcPortId() {
+  // Fake a TmrChunkAck event to ack all received messages
+  ReceiveTmrChunkAck();
   // clear all msg in sndqueue_
   sndqueue_.Clear();
 }
@@ -156,6 +158,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   // send ack for @chunk_size_ msgs starting from fseq
   SendChunkAck(fseq, svc_id, chunk_size_);
   rcvwnd_.acked_ = rcvwnd_.rcv_;
+  rc = NCSCC_RC_CONTINUE;
 }
   } else {
 // todo: update rcvwnd_.nacked_space_.
@@ -258,4 +261,16 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
   }
 }
 
+void TipcPortId::ReceiveTmrChunkAck() {
+  uint16_t chksize = rcvwnd_.rcv_ - rcvwnd_.acked_;
+  if (chksize > 0) {
+m_MDS_LOG_DBG("FCTRL: [node:%x, ref:%u], "
+"ChkAckExp",
+id_.node, id_.ref);
+// send ack for @chksize msgs starting from rcvwnd_.rcv_
+SendChunkAck(rcvwnd_.rcv_, 0, chksize);
+rcvwnd_.acked_ =

Re: [devel] [PATCH 0/2] Review Request for fmd: improve failover response time [#3008]

2019-02-18 Thread minh . chau

Hi Gary,

ack for code review. Still a few other places that call
opensaf_quick_reboot can be visited later.

Thanks,
Minh
> Summary: fmd: improve failover response time V2 [#3008]
> Review request for Ticket(s): 3008
> Peer Reviewer(s): Hans, Minh
> Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
> Affected branch(es): develop
> Development branch: ticket-3008
> Base revision: 5766361568498f8a496d87d8daafe9bffbd75ed9
> Personal repository: git://git.code.sf.net/u/userid-2226215/review
>
> 
> Impacted area   Impact y/n
> 
>  Docsn
>  Build systemn
>  RPM/packaging   n
>  Configuration files n
>  Startup scripts n
>  SAF servicesy
>  OpenSAF servicesn
>  Core libraries  n
>  Samples n
>  Tests   n
>  Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
>
> revision 8ccffc2cd9cd117578227e9cd49421e5c578fec6
> Author:   Gary Lee 
> Date: Tue, 19 Feb 2019 14:57:53 +1100
>
> rded: do not send SUCCESS to main thread [#3008]
>
> do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
> main thread if lock cannot be obtained
>
>
>
> revision 28e17d107f4a079155e03d9f875a3c0262ea19f5
> Author:   Gary Lee 
> Date: Tue, 19 Feb 2019 14:57:53 +1100
>
> fmd: improve failover response time [#3008]
>
> Improve failover response time if split brain prevention is enabled
> but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.
>
> Also, return immediately if node promotion fails to avoid
> sending active role to RDA.
>
>
>
> Complete diffstat:
> --
>  src/fm/fmd/fm_rda.cc | 14 +-
>  src/rde/rded/role.cc |  2 ++
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
>
> Testing Commands:
> -
> *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***
>
>
> Testing, Expected Results:
> --
> *** PASTE COMMAND OUTPUTS / TEST RESULTS ***
>
>
> Conditions of Submission:
> -
> *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank
> entries
> that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your
> headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
> (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
> Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
> like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
> cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
> too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
> Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
> commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
> of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
> comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email
> etc)
>
> ___ Your computer have a badly configured date and time; confusing the
> the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
> for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
> do not contain the patch that updates the Doxygen manual.
>
>

[devel] [PATCH 0/1] Review Request for osaf: Call opensaf_quick_reboot if failed to set active role in consensus [#3001]

2019-02-15 Thread Minh Chau

Summary: osaf: Call opensaf_quick_reboot if failed to set active role in 
consensus [#3001]
Review request for Ticket(s): 3001
Peer Reviewer(s): Hans, Gary, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3001
Base revision: 497b55530e1562b88522e3e2a6f4c5dd21fb4f50
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision a053a51ea1116dc2bb4342fa2ddf5827877b43cf
Author: Minh Chau 
Date:   Fri, 15 Feb 2019 20:33:30 +1100

osaf: Call opensaf_quick_reboot if failed to set active role in consensus 
[#3001]



Complete diffstat:
--
 src/fm/fmd/fm_rda.cc | 4 ++--
 src/rde/rded/rde_main.cc | 8 +++-
 src/rde/rded/role.cc | 8 
 3 files changed, 9 insertions(+), 11 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: Call opensaf_quick_reboot if failed to set active role in consensus [#3001]

2019-02-15 Thread Minh Chau

---
 src/fm/fmd/fm_rda.cc | 4 ++--
 src/rde/rded/rde_main.cc | 8 +++-
 src/rde/rded/role.cc | 8 
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index 028bfa3..0aa5a3d 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -97,8 +97,8 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
 rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
 if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
   LOG_ER("Unable to set active controller in consensus service");
-  opensaf_reboot(0, nullptr,
- "Unable to set active controller in consensus service");
+  opensaf_quick_reboot("Unable to set active controller"
+  "in consensus service");
 } else if (rc == SA_AIS_ERR_EXIST) {
   // @todo if we don't reboot, we don't seem to recover from this. Can we
   // improve?
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index bb17133..3487f0b 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -203,9 +203,8 @@ static void handle_mbx_event() {
 if (state == Consensus::TakeoverState::ACCEPTED) {
   LOG_NO("Accepted takeover request");
   if (consensus_service.IsRemoteFencingEnabled() == false) {
-opensaf_reboot(0, nullptr,
-   "Another controller is taking over the active role. 
"
-   "Rebooting this node");
+opensaf_quick_reboot("Another controller is taking over"
+"the active role. Rebooting this node");
   }
 } else if (state == Consensus::TakeoverState::UNDEFINED) {
   bool fencing_required = true;
@@ -233,8 +232,7 @@ static void handle_mbx_event() {
   if (fencing_required == true) {
 LOG_NO("Lost connectivity to consensus service");
 if (consensus_service.IsRemoteFencingEnabled() == false) {
-opensaf_reboot(0, nullptr,
-   "Lost connectivity to consensus service. "
+opensaf_quick_reboot("Lost connectivity to consensus service. "
"Rebooting this node");
 }
   }
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 499f7c8..b2b9b49 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -112,8 +112,8 @@ void Role::PromoteNode(const uint64_t cluster_size,
 promotion_pending = true;
   } else if (rc != SA_AIS_OK) {
 LOG_ER("Unable to set active controller in consensus service");
-opensaf_reboot(0, nullptr,
-   "Unable to set active controller in consensus service");
+opensaf_quick_reboot("Unable to set active controller"
+"in consensus service");
   }
 
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
@@ -133,8 +133,8 @@ void Role::PromoteNode(const uint64_t cluster_size,
   rc = consensus_service.PromoteThisNode(true, cluster_size);
   if (rc == SA_AIS_ERR_EXIST) {
 LOG_ER("Unable to set active controller in consensus service");
-opensaf_reboot(0, nullptr,
-   "Unable to set active controller in consensus service");
+opensaf_quick_reboot("Unable to set active controller in"
+"consensus service");
   }
   std::this_thread::sleep_for(std::chrono::seconds(1));
 }
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: Only start clm track for 2N Opensaf SU in failover [#2980]

2018-12-09 Thread Minh Chau

---
 src/amf/amfd/sg_2n_fsm.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index a218786..91ffc63 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -1784,7 +1784,8 @@ uint32_t SG_2N::susi_success_sg_realign(AVD_SU *su, 
AVD_SU_SI_REL *susi,
   }
 
   if ((state == SA_AMF_HA_ACTIVE) &&
-  (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
+  (cb->node_id_avd == su->su_on_node->node_info.nodeId) &&
+  (su->sg_of_su->sg_ncs_spec == true)) {
 /* This is as a result of failover, start CLM tracking*/
 if (avd_clm_track_start(cb) == SA_AIS_ERR_TRY_AGAIN)
   Fifo::queue(new ClmTrackStart());
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: Only start clm track for 2N Opensaf SU in failover [#2980]

2018-12-09 Thread Minh Chau

Summary: amfd: Only start clm track for 2N Opensaf SU in failover [#2980]
Review request for Ticket(s): 2980
Peer Reviewer(s): Hans, Nagu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2980
Base revision: 9a730d22b0580e6e3c54fd3a4fd5bb4cf82c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision cb9e0e0e88e8dab750c75ab4eae44f8f952bd88f
Author: Minh Chau 
Date:   Mon, 10 Dec 2018 16:47:00 +1100

amfd: Only start clm track for 2N Opensaf SU in failover [#2980]



Complete diffstat:
--
 src/amf/amfd/sg_2n_fsm.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: Fix misordered and dropped item in job queue [#2981]

2018-12-09 Thread Minh Chau

---
 src/amf/amfd/imm.cc  | 54 ++--
 src/amf/amfd/imm.h   |  2 --
 src/amf/amfd/role.cc |  2 +-
 3 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index 82d2b13..d917b0d 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -456,16 +456,19 @@ AvdJobDequeueResultT Fifo::executeAll(AVD_CL_CB *cb, 
AvdJobTypeT job_type) {
   if (ret != JOB_EXECUTED)
 break;
 } else {
-  // push back
-  ajob = Fifo::dequeue();
-  Fifo::queue(ajob);
-
   // check if we have gone through all jobs of queue
   if (firstjob == nullptr) {
 firstjob = ajob;
+// push back
+ajob = Fifo::dequeue();
+Fifo::queue(ajob);
   } else {
-if (firstjob == ajob)
-  break;
+if (firstjob == ajob) break;
+else {
+  // push back
+  ajob = Fifo::dequeue();
+  Fifo::queue(ajob);
+}
   }
 }
   }
@@ -476,54 +479,33 @@ AvdJobDequeueResultT Fifo::executeAll(AVD_CL_CB *cb, 
AvdJobTypeT job_type) {
 }
 
 void Fifo::remove(const AVD_CL_CB *cb, AvdJobTypeT job_type) {
-
   Job *ajob, *firstjob;
 
   TRACE_ENTER();
   firstjob = nullptr;
-
   while ((ajob = peek()) != nullptr) {
 if (ajob->getJobType() == job_type) {
   delete Fifo::dequeue();
 } else {
-  // push back
-  ajob = Fifo::dequeue();
-  Fifo::queue(ajob);
-
   // check if we have gone through all jobs of queue
   if (firstjob == nullptr) {
 firstjob = ajob;
+// push back
+ajob = Fifo::dequeue();
+Fifo::queue(ajob);
   } else {
-if (firstjob == ajob)
-  break;
+if (firstjob == ajob) break;
+else {
+  // push back
+  ajob = Fifo::dequeue();
+  Fifo::queue(ajob);
+}
   }
 }
   }
-
   TRACE_LEAVE();
 }
 
-AvdJobDequeueResultT Fifo::executeAdminResp(AVD_CL_CB *cb) {
-  Job *ajob;
-  AvdJobDequeueResultT ret = JOB_EXECUTED;
-
-  TRACE_ENTER();
-
-  while ((ajob = peek()) != nullptr) {
-if (dynamic_cast(ajob) != nullptr) {
-  ret = ajob->exec(cb);
-} else {
-  ajob = dequeue();
-  delete ajob;
-  ret = JOB_EXECUTED;
-}
-  }
-
-  TRACE_LEAVE2("%d", ret);
-
-  return ret;
-}
-
 //
 void Fifo::empty() {
   Job *ajob;
diff --git a/src/amf/amfd/imm.h b/src/amf/amfd/imm.h
index 3cfc207..670c691 100644
--- a/src/amf/amfd/imm.h
+++ b/src/amf/amfd/imm.h
@@ -169,8 +169,6 @@ class Fifo {
   AvdJobTypeT job_type = JOB_TYPE_ANY);
   static void remove(const AVD_CL_CB *cb,
   AvdJobTypeT job_type = JOB_TYPE_ANY);
-  static AvdJobDequeueResultT executeAdminResp(AVD_CL_CB *cb);
-
   static void empty();
 
   static uint32_t size();
diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index 42f77f8..15b0458 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -802,7 +802,7 @@ try_again:
   /* Execute admin op jobs before calling saImmOiImplementerClear to avoid
* SA_AIS_ERR_TIMEOUT
*/
-  Fifo::executeAdminResp(cb);
+  Fifo::executeAll(cb, JOB_TYPE_IMM);
 
   /* Take mutex here to sync with imm reinit thread.*/
   osaf_mutex_lock_ordie(_reinit_mutex);
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: Fix misordered and dropped item in job queue [#2981]

2018-12-09 Thread Minh Chau

Summary: amfd: Fix misordered and dropped item in job queue [#2981]
Review request for Ticket(s): 2981
Peer Reviewer(s): Gary, Hans, Nagu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2981
Base revision: 9a730d22b0580e6e3c54fd3a4fd5bb4cf82c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 5c4da227a3dd03935ef3d44c1f89aa0438083a41
Author: Minh Chau 
Date:   Mon, 10 Dec 2018 16:36:26 +1100

amfd: Fix misordered and dropped item in job queue [#2981]



Complete diffstat:
--
 src/amf/amfd/imm.cc  | 54 ++--
 src/amf/amfd/imm.h   |  2 --
 src/amf/amfd/role.cc |  2 +-
 3 files changed, 19 insertions(+), 39 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: Update the assignment counters after restore absent assignment from imm [#2977]

2018-12-02 Thread Minh Chau

Summary: amfd: Update the assignment counters after restore absent assignment 
from imm [#2977]
Review request for Ticket(s): 2977
Peer Reviewer(s): Hans, Nagu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2977
Base revision: 9a730d22b0580e6e3c54fd3a4fd5bb4cf82c
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision fc51aca18db8060be0e6577e2a23339826a58693
Author: Minh Chau 
Date:   Mon, 3 Dec 2018 13:54:37 +1100

amfd: Update the assignment counters after restore absent assignment from imm 
[#2977]

AMF performs headless recovery by syncing the assignments from AMFND(s) and
re-create them in AMFD's db and IMM. Next step, AMFD compares the assignment
objects from IMM and from AMFND(s) to figure out the on-going assignments
that have been left over before headless and failover them, the assignments
states/counters are also restored in this step. If all payloads come from
headless without occurence of network split (legacy headless), IMM db in all
payloads should be consistent, thus AMFD creates the IMM assignments normally
without any problem. But if the payloads come from headless and there was a
network split before, IMM appears often busy at the time AMFD creates the
synced assignments in IMM. The assignment object creation is pending in the
queue and executed later, but AMFD has missed to restore the assignment states
and counters of the synced assignments at the time comparision between IMM
and AMFND(s).
Also in legacy headless, when both SCs go down, the assignment objects are
still in IMM. Even IMM is busy, AMFD has not missed the counter updates.

The patch moves the counter update after restoring absent assignment from IMM.



Complete diffstat:
--
 src/amf/amfd/siass.cc | 67 +--
 1 file changed, 38 insertions(+), 29 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

__

[devel] [PATCH 1/1] amfd: Update the assignment counters after restore absent assignment from imm [#2977]

2018-12-02 Thread Minh Chau

AMF performs headless recovery by syncing the assignments from AMFND(s) and
re-create them in AMFD's db and IMM. Next step, AMFD compares the assignment
objects from IMM and from AMFND(s) to figure out the on-going assignments
that have been left over before headless and failover them, the assignments
states/counters are also restored in this step. If all payloads come from
headless without occurence of network split (legacy headless), IMM db in all
payloads should be consistent, thus AMFD creates the IMM assignments normally
without any problem. But if the payloads come from headless and there was a
network split before, IMM appears often busy at the time AMFD creates the
synced assignments in IMM. The assignment object creation is pending in the
queue and executed later, but AMFD has missed to restore the assignment states
and counters of the synced assignments at the time comparision between IMM
and AMFND(s).
Also in legacy headless, when both SCs go down, the assignment objects are
still in IMM. Even IMM is busy, AMFD has not missed the counter updates.

The patch moves the counter update after restoring absent assignment from IMM.
---
 src/amf/amfd/siass.cc | 67 +--
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc
index ffde7b1..8a2d217 100644
--- a/src/amf/amfd/siass.cc
+++ b/src/amf/amfd/siass.cc
@@ -264,14 +264,48 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) {
   }
 
 #endif
+} else {  // For ABSENT SUSI
+  TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state,
+imm_susi_fsm);
+  if (avd_susi_validate_absent_assignment(su, si,
+  imm_ha_state, imm_susi_fsm) == false) {
+avd_saImmOiRtObjectDelete(Amf::to_string());
+continue;
+  }
+  absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false,
+  AVSV_SUSI_ACT_BASE);
+  // Restore the fsm of this absent SUSI, which is used to determine
+  // whether a SU should be added in SG's SUOperationList
+  // Memorize it in temporary var @absent
+  // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT
+  // after restoring SUOperationList
+  absent_susi->fsm = imm_susi_fsm;
+  absent_susi->absent = true;
+  if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED ||
+  absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) {
+if (absent_susi->fsm == AVD_SU_SI_STATE_MODIFY &&
+(absent_susi->state == SA_AMF_HA_QUIESCED ||
+absent_susi->state == SA_AMF_HA_QUIESCING)) {
+  m_AVD_SET_SG_ADMIN_SI(cb, si);
+}
+  }
+}
+  }
+  (void)immutil_saImmOmSearchFinalize(searchHandle);
+
+  // Update all PRESENT SUSI, in case that a SUSI is missed to update because
+  // it is not present in IMM
+  for (const auto  : *su_db) {
+AVD_SU *su = value.second;
+susi = su->list_of_susi;
+while (susi != nullptr && susi->absent == false) {
+  AVD_SI *si = susi->si;
   // validate SUSI assignments that are over assigned
   if (avd_susi_validate_excessive_assignment(susi) == true) {
 susi->fsm = AVD_SU_SI_STATE_EXCESSIVE;
   }
-
   // Checkpoint to add this SUSI
   m_AVSV_SEND_CKPT_UPDT_ASYNC_ADD(avd_cb, susi, AVSV_CKPT_AVD_SI_ASS);
-
   // restore assignment counter
   if (susi->fsm == AVD_SU_SI_STATE_ASGN ||
   susi->fsm == AVD_SU_SI_STATE_ASGND ||
@@ -296,36 +330,11 @@ void avd_susi_read_headless_cached_rta(AVD_CL_CB *cb) {
   // only restore if not done
   if (susi->su->su_on_node->admin_ng == nullptr)
 avd_ng_restore_headless_states(cb, susi);
-} else {  // For ABSENT SUSI
-  TRACE("Check absent SUSI, ha_state:'%u', fsm_state:'%u'", imm_ha_state,
-imm_susi_fsm);
-  if (avd_susi_validate_absent_assignment(su, si,
-  imm_ha_state, imm_susi_fsm) == false) {
-avd_saImmOiRtObjectDelete(Amf::to_string());
-continue;
-  }
-  absent_susi = avd_susi_create(avd_cb, si, su, imm_ha_state, false,
-  AVSV_SUSI_ACT_BASE);
-  // Restore the fsm of this absent SUSI, which is used to determine
-  // whether a SU should be added in SG's SUOperationList
-  // Memorize it in temporary var @absent
-  // The fsm of this SUSI will be changed to AVD_SU_SI_STATE_ABSENT
-  // after restoring SUOperationList
-  absent_susi->fsm = imm_susi_fsm;
-  absent_susi->absent = true;
-  if (absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_LOCKED ||
-  absent_susi->si->saAmfSIAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) {
-if (absent_susi->fsm == AVD_SU_SI_STATE_MODIFY &&
-(absent_susi->state == SA_AMF_HA_QUIESCED ||
-absent_susi->state == SA_AMF_HA_QUIESCING)) {
-  m_AVD_SET_SG_ADMIN_SI(cb, si);
-}
-  }
+
+  susi = susi->su_next;
 }
   }
 
-

[devel] [PATCH 0/1] Review Request for amfd: Do not create absent assignment if number of assignment exceeds [#2968]

2018-11-21 Thread Minh Chau

Summary: amfd: Do not create absent assignment if number of assignment exceeds 
[#2968]
Review request for Ticket(s): 2968
Peer Reviewer(s): Hans, Nagu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2968
Base revision: c43ae9d97d169cc4a3b57da14ed9191dca8dfba5
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 72ff674dea7f796b8d957ed7ecaccb0d404f6360
Author: Minh Chau 
Date:   Thu, 22 Nov 2018 10:19:58 +1100

amfd: Do not create absent assignment if number of assignment exceeds [#2968]



Complete diffstat:
--
 src/amf/amfd/sg_2n_fsm.cc | 2 +-
 src/amf/amfd/siass.cc | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: Do not create absent assignment if number of assignment exceeds [#2968]

2018-11-21 Thread Minh Chau

---
 src/amf/amfd/sg_2n_fsm.cc | 2 +-
 src/amf/amfd/siass.cc | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index 72edf9d..a218786 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -127,7 +127,7 @@ AVD_SU_SI_STATE avd_su_fsm_state_determine(AVD_SU *su) {
   absent_flag, excessive_flag);
   if (absent_flag == true) {
 fsm_state = AVD_SU_SI_STATE_ABSENT;
-  } if (excessive_flag == true) {
+  } else if (excessive_flag == true) {
 fsm_state = AVD_SU_SI_STATE_EXCESSIVE;
   } else if (true == modify_flag) {
 /* Rule 1. => If any one of the SUSI is Mod, then SU will be said to be
diff --git a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc
index c3f04a6..ffde7b1 100644
--- a/src/amf/amfd/siass.cc
+++ b/src/amf/amfd/siass.cc
@@ -348,6 +348,9 @@ bool avd_susi_validate_absent_assignment(AVD_SU *su, AVD_SI 
*si,
   su->sg_of_su->any_assignment_assigned() == false) {
 goto done;
   }
+  // Skip if any excessive assignment exists
+  if (su->sg_of_su->any_assignment_excessive())
+goto done;
   // Support: 2N, NoRed, NwayActive. Not support: NpM, Nway
   if (su->sg_of_su->sg_redundancy_model == SA_AMF_NPM_REDUNDANCY_MODEL ||
   su->sg_of_su->sg_redundancy_model == SA_AMF_N_WAY_REDUNDANCY_MODEL) {
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: set node failover state correctly on standby [#2963]

2018-11-15 Thread minh . chau

Hi, Ack (code review). Thanks/Minh

> ---
>  src/amf/amfd/node_state_machine.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amf/amfd/node_state_machine.cc
> b/src/amf/amfd/node_state_machine.cc
> index 478ad2a48..c5d86d33c 100644
> --- a/src/amf/amfd/node_state_machine.cc
> +++ b/src/amf/amfd/node_state_machine.cc
> @@ -80,7 +80,7 @@ void NodeStateMachine::SetState(uint32_t state) {
>state_ = std::make_shared(this);
>break;
>  case NodeState::kEnd:
> -  state_ = std::make_shared(this);
> +  state_ = std::make_shared(this);
>cb_->failover_list.erase(node_id_);
>break;
>  default:
> --
> 2.17.1
>
>




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: Give assignment for pre-instantiated su after the node joins cluster [#2960]

2018-11-13 Thread Minh Chau

If the SU is unlock-in/unlock before the node joins cluster, the SU is 
instantiated
and in unlocked state. However, when the node completes joining the cluster, 
amfd
assumes all applications' SU uninstantiated and starts the instantiation, thus 
the
instantiated/unlocked SU is forgot to give the assignments.

The patch handles the above case by giving assignments to the 
instantiated/unlocked
SU.
---
 src/amf/amfd/sgproc.cc | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 8513132..a429bdf 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -1846,10 +1846,28 @@ void avd_sg_app_node_su_inst_func(AVD_CL_CB *cb, 
AVD_AVND *avnd) {
 
   } else if (cb->init_state == AVD_APP_STATE) {
 for (const auto _su : avnd->list_of_su) {
-  if ((i_su->term_state == false) &&
-  (i_su->saAmfSUPresenceState == SA_AMF_PRESENCE_UNINSTANTIATED)) {
-/* Look at the SG and do the instantiations. */
-avd_sg_app_su_inst_func(cb, i_su->sg_of_su);
+  if (i_su->term_state == false) {
+/* If SU is UNINSTANTIATED, look at the SG and do the instantiations.
+ * If SU is INSTANTIATED but still OUT_OF_SERVICE, this case can happen
+ * if the SU is unlock() while cluster instantiation.
+ * Otherwise presenceState, amfd will continue from intermediate state
+ */
+if (i_su->saAmfSUPresenceState == SA_AMF_PRESENCE_UNINSTANTIATED) {
+  avd_sg_app_su_inst_func(cb, i_su->sg_of_su);
+} else if (i_su->saAmfSUPresenceState ==
+SA_AMF_PRESENCE_INSTANTIATED) {
+  if (i_su->is_in_service() && i_su->sg_of_su->sg_ncs_spec == false &&
+  i_su->saAmfSuReadinessState == SA_AMF_READINESS_OUT_OF_SERVICE) {
+i_su->set_readiness_state(SA_AMF_READINESS_IN_SERVICE);
+if (i_su->sg_of_su->su_insvc(cb, i_su) == NCSCC_RC_FAILURE) {
+  LOG_ER("%s:%d %s", __FUNCTION__, __LINE__, i_su->name.c_str());
+  i_su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
+}
+  }
+} else {
+  LOG_WA("SU'%s' has unexpected saAmfSUPresenceState:'%d'",
+  i_su->name.c_str(), i_su->saAmfSUPresenceState);
+}
   }
 }
   }
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: Give assignment for pre-instantiated su after the node joins cluster [#2960]

2018-11-13 Thread Minh Chau

Summary: amfd: Give assignment for pre-instantiated su after the node joins 
cluster [#2960]
Review request for Ticket(s): 2960
Peer Reviewer(s): Hans, Nagu, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2960
Base revision: 5bb2174a323a97f626ce354d553a1dc4d1673899
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision b24fdd8fb03de9bb966a7cab36c95870ce507cb6
Author: Minh Chau 
Date:   Wed, 14 Nov 2018 15:21:17 +1100

amfd: Give assignment for pre-instantiated su after the node joins cluster 
[#2960]

If the SU is unlock-in/unlock before the node joins cluster, the SU is 
instantiated
and in unlocked state. However, when the node completes joining the cluster, 
amfd
assumes all applications' SU uninstantiated and starts the instantiation, thus 
the
instantiated/unlocked SU is forgot to give the assignments.

The patch handles the above case by giving assignments to the 
instantiated/unlocked
SU.



Complete diffstat:
--
 src/amf/amfd/sgproc.cc | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mds: Send NCSMDS_DOWN with vdest if there is no any adest [#2941]

2018-10-24 Thread Minh Chau

If split brain happens and network merges back, at this point in time
there are a few mds events coming to payloads, which are the SVC UP
from the other controller; SVC down from services in both controllers
due to reboot from split brain detection.
In the ticket description, the first partition includes SC1, PL3,
the second partition includes SC2, PL4, PL5. The amfnd on PL3 is
missing NCSMDS_DOWN with vdest in the below scenario:

- SVC up event from the other amfd (on SC2)
- SVC down event from amfd (SC1), it's the same active adest from
mds-PL3's view, start await_active timer, but no NCSMDS_DOWN with
vdest is sent because the adest on SC2 exists.
- SVC down event from amfd (SC2), it's different active adest.

Because the payloads reside in different partitions so they don't
have the same active adest view at mds level. When both SCs go down
due to split brain detection, the same SVC down events occur and
comes to all payloads, but they have different view so they behave
differently to the payloads in the other partition.

The patch adds an additional condition to send NCSMDS_DOWN if there is
no actual adest existed
---
 src/mds/mds_c_api.c | 80 ++---
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index f5ba318..73849cc 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -3644,13 +3644,58 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
svc_id, V_DEST_RL role,
local_svc_hdl, svc_id, vdest_id,
_adest, _running,
_result_info, true);
-
+   m_MDS_LOG_INFO("MCM:API: svc_down: "
+ "active_adest:%lu", active_adest);
/* First delete the entry */
mds_subtn_res_tbl_del(
local_svc_hdl, svc_id, vdest_id, adest,
vdest_policy, svc_sub_part_ver,
archword_type);
 
+   MDS_SUBSCRIPTION_RESULTS_INFO *s_info = NULL;
+   bool adest_exists = false;
+
+   /* if no adest remains for this svc
+* send MDS_DOWN
+*/
+   status = mds_subtn_res_tbl_getnext_any(
+   local_svc_hdl, svc_id,
+   _info);
+
+   while (status != NCSCC_RC_FAILURE) {
+   if (s_info->key.vdest_id !=
+   m_VDEST_ID_FOR_ADEST_ENTRY) {
+   adest_exists = true;
+   break;
+   }
+
+   status = mds_subtn_res_tbl_getnext_any(
+   local_svc_hdl, svc_id, _info);
+   }
+
+   if (active_adest != adest
+ && vdest_policy == NCS_VDEST_TYPE_MxN
+   && adest_exists == false) {
+   m_MDS_LOG_INFO("MCM:API: svc_down : "
+   "svc_id = %s(%d) on DEST id = 
%d "
+   "got NO_ACTIVE for svc_id = 
%s(%d) "
+"on Vdest id = %d Adest = %s, rem_svc_pvt_ver=%d",
+   get_svc_names(
+   
m_MDS_GET_SVC_ID_FROM_SVC_HDL(local_svc_hdl)),
+   m_MDS_GET_SVC_ID_FROM_SVC_HDL(
+   local_svc_hdl),
+   m_MDS_GET_VDEST_ID_FROM_SVC_HDL(
+   local_svc_hdl),
+   get_svc_names(svc_id), svc_id,
+   vdest_id,
+   
log_subtn_result_info->sub_adest_details,
+   svc_sub_part_ver);
+   status = mds_mcm_user_event_callback(
+ local_svc_hdl, pwe_id, svc_id,
+ role, vdest_id, 0, NCSMDS_DOWN,
+   svc_sub_part_ver, 
archword_type);
+   }
+
if (active_adest == adest) {
if (vdest_policy ==
NCS_VDEST_TYPE_MxN) {

[devel] [PATCH 2/2] amfd: Remove sending node reboot in 2N SG for duplicated assignment [#2929]

2018-10-15 Thread Minh Chau

The first part of #2929 which has introduced EXCESSIVE susi fms state,
it also handles the duplicated 2N assignments so that the node that has
duplicated assignments will be reboot.
This patch removes the sending node reboot in avd_sg_2n_act_susi(), or
amfd will send multiple node reboot to the same node otherwise. This
patch also checks the duplicated QUIESCED assignments.
---
 src/amf/amfd/sg_2n_fsm.cc | 34 +++---
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index f919291..72edf9d 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -598,41 +598,21 @@ static AVD_SU_SI_REL *avd_sg_2n_act_susi(AVD_CL_CB *cb, 
AVD_SG *sg,
standby. */
 if ((SA_AMF_HA_QUIESCED == avd_su_state_determine(su_1)) &&
 (SA_AMF_HA_QUIESCED == avd_su_state_determine(su_2))) {
-  osafassert(a_susi_1->su == s_susi_2->su);
-  osafassert(a_susi_2->su == s_susi_1->su);
+  if(a_susi_1->su != s_susi_2->su || a_susi_2->su == s_susi_1->su) {
+// Duplicate 2N quiesced assignments found
+LOG_WA("Duplicate 2N quiesced assignments exist in '%s' and '%s'",
+  s_susi_1->su->name.c_str(), s_susi_2->su->name.c_str());
+  }
 } else {
   if (a_susi_1->su != a_susi_2->su) {
 // Duplicate 2N active assignments found, probably after split brain
-// Reboot both nodes hosting the SUs to recover
-
-LOG_EM("Duplicate 2N active assignments in '%s' and '%s'",
+LOG_WA("Duplicate 2N active assignments exist in '%s' and '%s'",
   a_susi_1->su->name.c_str(), a_susi_2->su->name.c_str());
 
-LOG_EM("Sending node reboot order to '%s'",
-  a_susi_1->su->su_on_node->name.c_str());
-avd_d2n_reboot_snd(a_susi_1->su->su_on_node);
-
-if (a_susi_1->su->su_on_node != a_susi_2->su->su_on_node) {
-  LOG_EM("Sending node reboot order to '%s'",
-a_susi_2->su->su_on_node->name.c_str());
-  avd_d2n_reboot_snd(a_susi_2->su->su_on_node);
-}
   } else if (s_susi_1->su != s_susi_2->su) {
 // Duplicate 2N standby assignments found
-// Reboot both nodes hosting the SUs to recover
-
-LOG_EM("Duplicate 2N standby assignments in '%s' and '%s'",
+LOG_WA("Duplicate 2N standby assignments exist in '%s' and '%s'",
   s_susi_1->su->name.c_str(), s_susi_2->su->name.c_str());
-
-LOG_EM("Sending node reboot order to '%s'",
-  s_susi_1->su->su_on_node->name.c_str());
-avd_d2n_reboot_snd(s_susi_1->su->su_on_node);
-
-if (s_susi_1->su->su_on_node != s_susi_2->su->su_on_node) {
-  LOG_EM("Sending node reboot order to '%s'",
-s_susi_2->su->su_on_node->name.c_str());
-  avd_d2n_reboot_snd(s_susi_2->su->su_on_node);
-}
   }
 }
 a_susi = a_susi_1;
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/2] Review Request for amf: Handle the excessive assigments for 2N, Nway Active, NoRed due to split brain V2 [#2929]

2018-10-15 Thread Minh Chau

Summary: amf: Add new susi fsm EXCESSIVE state to handle excessive assignment 
due to splitbrain V2 [#2929]
Review request for Ticket(s): 2929
Peer Reviewer(s): Hans, Gary, Nagu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2929
Base revision: 9442e2bfc9c883a10bca1a88816da9ff6fda2921
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 1677c8129803c69411a69d20f03ee881cc28adb7
Author: Minh Chau 
Date:   Tue, 16 Oct 2018 08:58:17 +1100

amfd: Remove sending node reboot in 2N SG for duplicated assignment [#2929]

The first part of #2929 which has introduced EXCESSIVE susi fms state,
it also handles the duplicated 2N assignments so that the node that has
duplicated assignments will be reboot.
This patch removes the sending node reboot in avd_sg_2n_act_susi(), or
amfd will send multiple node reboot to the same node otherwise. This
patch also checks the duplicated QUIESCED assignments.



revision 8bdce6a88fc339f20f4dc71d6695044a8997
Author: Minh Chau 
Date:   Tue, 16 Oct 2018 08:56:48 +1100

amf: Add new susi fsm EXCESSIVE state to handle excessive assignment due to 
splitbrain V2 [#2929]

Once splitbrain happens, we have multiple partitions, in which AMF will continue
assignments to the spare SUs in each partitions. When network merge, these 
partitions
join into one cluster and the assignments of SU become excessive.

This patch adds a new susi fsm EXCESSIVE state, which is marked for the 
excessive
assignments that AMF detects after multiple partitions join.
For 2N SG: Any excessive assignment exists, the SU that has 2N assignment has 
its hosting node reboot
For NWay Active, NoRed: Remove the excessive assignment only.
For NpM, Nway: not supported



Complete diffstat:
--
 src/amf/amfd/cluster.cc|  5 
 src/amf/amfd/sg.cc | 33 +
 src/amf/amfd/sg.h  |  9 +-
 src/amf/amfd/sg_2n_fsm.cc  | 46 +++---
 src/amf/amfd/sg_nored_fsm.cc   | 19 
 src/amf/amfd/sg_nwayact_fsm.cc | 19 
 src/amf/amfd/sgproc.cc | 34 +-
 src/amf/amfd/si.cc | 11 +++
 src/amf/amfd/si.h  |  1 +
 src/amf/amfd/siass.cc  | 65 ++
 src/amf/amfd/susi.h|  5 +++-
 11 files changed, 191 insertions(+), 56 deletions(-)


Testing Commands:
-
Repeat the tests described in #2926, #2920, #2929


Testing, Expected Results:
--
For 2N: Node has involved any excessive 2N assignments will be reboot
For NwayActive, NoRed: Excessive assignment is removed only


Conditions of Submission:
-
ack


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

__

[devel] [PATCH 1/2] amf: Add new susi fsm EXCESSIVE state to handle excessive assignment due to splitbrain V2 [#2929]

2018-10-15 Thread Minh Chau

Once splitbrain happens, we have multiple partitions, in which AMF will continue
assignments to the spare SUs in each partitions. When network merge, these 
partitions
join into one cluster and the assignments of SU become excessive.

This patch adds a new susi fsm EXCESSIVE state, which is marked for the 
excessive
assignments that AMF detects after multiple partitions join.
For 2N SG: Any excessive assignment exists, the SU that has 2N assignment has 
its hosting node reboot
For NWay Active, NoRed: Remove the excessive assignment only.
For NpM, Nway: not supported
---
 src/amf/amfd/cluster.cc|  5 
 src/amf/amfd/sg.cc | 33 +
 src/amf/amfd/sg.h  |  9 +-
 src/amf/amfd/sg_2n_fsm.cc  | 12 ++--
 src/amf/amfd/sg_nored_fsm.cc   | 19 
 src/amf/amfd/sg_nwayact_fsm.cc | 19 
 src/amf/amfd/sgproc.cc | 34 +-
 src/amf/amfd/si.cc | 11 +++
 src/amf/amfd/si.h  |  1 +
 src/amf/amfd/siass.cc  | 65 ++
 src/amf/amfd/susi.h|  5 +++-
 11 files changed, 184 insertions(+), 29 deletions(-)

diff --git a/src/amf/amfd/cluster.cc b/src/amf/amfd/cluster.cc
index 83fd47d..07d9b5a 100644
--- a/src/amf/amfd/cluster.cc
+++ b/src/amf/amfd/cluster.cc
@@ -109,6 +109,11 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB *cb, AVD_EVT *evt) 
{
   continue;
 }
 
+if (i_sg->any_assignment_excessive()) {
+  i_sg->failover_excessive_assignment();
+  continue;
+}
+
 while (i_sg->any_assignment_absent()) {
   // failover with ABSENT SUSI, which had already been removed during
   // headless, until all ABSENT SUSI(s) are failovered successfully
diff --git a/src/amf/amfd/sg.cc b/src/amf/amfd/sg.cc
index f973e3a..fa011ec 100644
--- a/src/amf/amfd/sg.cc
+++ b/src/amf/amfd/sg.cc
@@ -2332,6 +2332,39 @@ bool AVD_SG::any_assignment_absent() {
   return pending;
 }
 
+bool AVD_SG::any_assignment_excessive() {
+  bool pending = false;
+  TRACE_ENTER2("SG:'%s'", name.c_str());
+  for (const auto  : list_of_su) {
+if (su->any_susi_fsm_in(AVD_SU_SI_STATE_EXCESSIVE)) {
+  pending = true;
+  break;
+}
+  }
+  TRACE_LEAVE();
+  return pending;
+}
+
+/*
+ * Going through all SU of this SG, if any SU has over assigned,
+ * reboot the node that hosts the SU.
+ */
+void AVD_SG::failover_excessive_assignment() {
+  TRACE_ENTER2("SG:'%s'", name.c_str());
+  for (const auto  : list_of_su) {
+if (su->list_of_susi != nullptr) {
+  if (su->saAmfSuReadinessState == SA_AMF_READINESS_IN_SERVICE) {
+LOG_EM("Duplicated assignment SU '%s'", su->name.c_str());
+LOG_EM("Sending node reboot order to '%s'",
+  su->su_on_node->name.c_str());
+su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
+avd_d2n_reboot_snd(su->su_on_node);
+  }
+}
+  }
+  TRACE_LEAVE();
+}
+
 bool AVD_SG::any_assignment_in_progress() {
   bool pending = false;
   TRACE_ENTER2("SG:'%s'", name.c_str());
diff --git a/src/amf/amfd/sg.h b/src/amf/amfd/sg.h
index 81595a2..9d2b7c4 100644
--- a/src/amf/amfd/sg.h
+++ b/src/amf/amfd/sg.h
@@ -431,10 +431,16 @@ class AVD_SG {
* @return
*/
   bool is_sg_serviceable_outside_ng(const AVD_AMF_NG *ng);
+  /*
+   * Failover the excessive assignment
+   */
+  virtual void failover_excessive_assignment();
+
   SaAisErrorT check_sg_stability();
   bool any_assignment_in_progress();
   bool any_assignment_absent();
   bool any_assignment_assigned();
+  bool any_assignment_excessive();
   void failover_absent_assignment();
   bool ng_using_saAmfSGAdminState;
   bool headless_validation;
@@ -517,7 +523,7 @@ class SG_NORED : public AVD_SG {
struct avd_su_si_rel_tag *susi, AVSV_SUSI_ACT act,
SaAmfHAStateT state);
   void ng_admin(AVD_SU *su, AVD_AMF_NG *ng);
-
+  void failover_excessive_assignment();
  private:
   AVD_SU *assign_sis_to_sus();
 };
@@ -580,6 +586,7 @@ class SG_NACV : public AVD_SG {
struct avd_su_si_rel_tag *susi, AVSV_SUSI_ACT act,
SaAmfHAStateT state);
   void ng_admin(AVD_SU *su, AVD_AMF_NG *ng);
+  void failover_excessive_assignment();
 };
 
 /**
diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index af8a4cc..f919291 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -78,6 +78,7 @@ AVD_SU_SI_STATE avd_su_fsm_state_determine(AVD_SU *su) {
   bool assigning_flag = false, assigned_flag = false, modify_flag = false,
unassingned_flag = false;
   bool absent_flag = false;
+  bool excessive_flag = false;
   AVD_SU_SI_STATE fsm_state = AVD_SU_SI_STATE_ABSENT;
 
   TRACE_ENTER2("SU '%s'", su->name.c_str());
@@ -109,6 +110,10 @@ AVD_SU_SI_STATE avd_su_fsm_state_determine(AVD_SU *su) {
   absent_flag = true;
   TRACE("Absent su'%s', si'%s'", temp_susi->su->name.c_str(),
 temp_susi->si->name.c_str());
+

1 2 3 4 5 >

1 - 100 of 411 matches

Mail list logo