[devel] [PATCH 1/1] imm: fix incorrect delay start sync [#3246]

2020-12-24 Thread thuan.tran
- Incorrect counting lost nodes cause new coordinator
postpone sync waiting for a number of node bigger than cluster size.
- Correct counting lost nodes by a set of lost node Id.
---
 src/imm/immnd/ImmModel.cc  | 14 ++
 src/imm/immnd/immnd_evt.c  |  4 ++--
 src/imm/immnd/immnd_init.h |  4 
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index 631597b8a..00d7f4794 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -524,6 +524,7 @@ typedef std::map MissingParentsMap;
 
 // Local variables
 
+static std::set sDiscardNodeSet;
 static ClassMap sClassMap;
 static AdminOwnerVector sOwnerVector;
 static CcbVector sCcbVector;
@@ -1364,12 +1365,25 @@ void immModel_getCcbIdsForOrigCon(IMMND_CB* cb, 
SaUint32T deadCon,
   osafassert(ix == (*arrSize));
 }
 
+void immModel_resetDiscardNodes(IMMND_CB* cb) {
+  cb->mLostNodes = 0;
+  sDiscardNodeSet.clear();
+}
+
+void immModel_eraseDiscardNode(SaUint32T nodeId) {
+  sDiscardNodeSet.erase(nodeId);
+}
+
 void immModel_discardNode(IMMND_CB* cb, SaUint32T nodeId, SaUint32T* arrSize,
   SaUint32T** ccbIdArr, SaUint32T* globArrSize,
   SaUint32T** globccbIdArr) {
   ConnVector cv, gv;
   ConnVector::iterator cvi, gvi;
   unsigned int ix = 0;
+  if (sDiscardNodeSet.find(nodeId) == sDiscardNodeSet.end()) {
+sDiscardNodeSet.insert(nodeId);
+cb->mLostNodes++;
+  }
   ImmModel::instance(>immModel)
   ->discardNode(nodeId, cv, gv, cb->mIsCoord, false);
   *arrSize = (SaUint32T)cv.size();
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index dfef6c0a5..af8f5876a 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10321,7 +10321,7 @@ static uint32_t immnd_evt_proc_start_sync(IMMND_CB *cb, 
IMMND_EVT *evt,
   Nodes. This is mostly relevant for "standby"
   i.e. the non-coord immnd which is on an SC.
 */
-   cb->mLostNodes = 0;
+   immModel_resetDiscardNodes(cb);
}
}
immModel_prepareForSync(cb, cb->mSync);
@@ -10488,6 +10488,7 @@ static uint32_t immnd_evt_proc_sync_req(IMMND_CB *cb, 
IMMND_EVT *evt,
cb->mSyncRequested = true;
if (cb->mLostNodes > 0) {
cb->mLostNodes--;
+   immModel_eraseDiscardNode(evt->info.ctrl.nodeId);
}
/*osafassert(cb->mRulingEpoch == evt->info.ctrl.rulingEpoch); */
TRACE_2("At COORD: My Ruling Epoch:%u Cenral Ruling Epoch:%u",
@@ -10989,7 +10990,6 @@ static void immnd_evt_proc_discard_node(IMMND_CB *cb, 
IMMND_EVT *evt,
/* We should remember the nodeId/pid pair to avoid a redundant message
   causing a newly reattached node being discarded.
 */
-   cb->mLostNodes++;
immModel_discardNode(cb, evt->info.ctrl.nodeId, , ,
 , );
if (globArrSize) {
diff --git a/src/imm/immnd/immnd_init.h b/src/imm/immnd/immnd_init.h
index 9a3f70072..0732f43f0 100644
--- a/src/imm/immnd/immnd_init.h
+++ b/src/imm/immnd/immnd_init.h
@@ -154,6 +154,10 @@ bool immModel_ccbAbort(IMMND_CB *cb, SaUint32T ccbId, 
SaUint32T *arrSize,
 void immModel_getCcbIdsForOrigCon(IMMND_CB *cb, SaUint32T origConn,
   SaUint32T *arrSize, SaUint32T **ccbIdArr);
 
+void immModel_resetDiscardNodes(IMMND_CB* cb);
+
+void immModel_eraseDiscardNode(SaUint32T nodeId);
+
 void immModel_discardNode(IMMND_CB *cb, SaUint32T nodeId, SaUint32T *arrSize,
   SaUint32T **ccbIdArr, SaUint32T *globArrSize,
   SaUint32T **globccbIdArr);
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix incorrect delay start sync [#3246]

2020-12-24 Thread thuan.tran
Summary: imm: fix incorrect delay start sync [#3246]
Review request for Ticket(s): 3246
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3246
Base revision: a83912c4bd648c3b4d00fb163137d322ba4e6009
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision fb07d97a678c245e0d0ac0ce02bfde5bcce7b176
Author: thuan.tran 
Date:   Thu, 24 Dec 2020 15:42:52 +0700

imm: fix incorrect delay start sync [#3246]

- Incorrect counting lost nodes cause new coordinator
postpone sync waiting for a number of node bigger than cluster size.
- Correct counting lost nodes by a set of lost node Id.



Complete diffstat:
--
 src/imm/immnd/ImmModel.cc  | 14 ++
 src/imm/immnd/immnd_evt.c  |  4 ++--
 src/imm/immnd/immnd_init.h |  4 
 3 files changed, 20 insertions(+), 2 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/2] Review Request for ntf: do not use trace in signal handler [#3245] V2

2020-12-20 Thread thuan.tran
Summary: ntf: do not use trace in signal handler [#3245]
Review request for Ticket(s): 3245
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3245
Base revision: 501241653d25bc2beffad7a25ea6a281d66c0c6f
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision a83912c4bd648c3b4d00fb163137d322ba4e6009
Author: thuan.tran 
Date:   Mon, 21 Dec 2020 11:00:29 +0700

log: do not use trace in signal handler [#3245]

Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.



revision 8247f906ee170c636e23a4dd6982de33e13f6663
Author: thuan.tran 
Date:   Mon, 21 Dec 2020 09:40:38 +0700

ntf: do not use trace in signal handler [#3245]

Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.



Complete diffstat:
--
 src/log/logd/lgs_main.cc | 5 -
 src/ntf/ntfd/ntfs_main.c | 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/2] ntf: do not use trace in signal handler [#3245]

2020-12-20 Thread thuan.tran
Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.
---
 src/ntf/ntfd/ntfs_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/ntf/ntfd/ntfs_main.c b/src/ntf/ntfd/ntfs_main.c
index 55ecc9e51..d74ddc926 100644
--- a/src/ntf/ntfd/ntfs_main.c
+++ b/src/ntf/ntfd/ntfs_main.c
@@ -137,7 +137,10 @@ static void sigusr1_handler(int sig)
(void)sig;
signal(SIGUSR1, SIG_IGN);
ncs_sel_obj_ind(_sel_obj);
-   TRACE("Got USR1 signal");
+   /* Do not use trace in the signal handler
+* It can apparently cause the main thread
+* to spin or deadlock. See ticket #3245.
+*/
 }
 
 #if 0
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 2/2] log: do not use trace in signal handler [#3245]

2020-12-20 Thread thuan.tran
Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.
---
 src/log/logd/lgs_main.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/log/logd/lgs_main.cc b/src/log/logd/lgs_main.cc
index b8ed96ec4..ed21382ef 100644
--- a/src/log/logd/lgs_main.cc
+++ b/src/log/logd/lgs_main.cc
@@ -150,7 +150,10 @@ static void sigusr1_handler(int sig) {
   (void)sig;
   signal(SIGUSR1, SIG_IGN);
   ncs_sel_obj_ind(_sel_obj);
-  TRACE("Got USR1 signal");
+  /* Do not use trace in the signal handler
+   * It can apparently cause the main thread
+   * to spin or deadlock. See ticket #3245.
+   */
 }
 
 /**
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for ntf: do not use trace in signal handler [#3245]

2020-12-20 Thread thuan.tran
Summary: ntf: do not use trace in signal handler [#3245]
Review request for Ticket(s): 3245
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3245
Base revision: 501241653d25bc2beffad7a25ea6a281d66c0c6f
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 8247f906ee170c636e23a4dd6982de33e13f6663
Author: thuan.tran 
Date:   Mon, 21 Dec 2020 09:40:38 +0700

ntf: do not use trace in signal handler [#3245]

Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.



Complete diffstat:
--
 src/ntf/ntfd/ntfs_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] ntf: do not use trace in signal handler [#3245]

2020-12-20 Thread thuan.tran
Do not use trace in the signal handler.
It can apparently cause the main thread to spin or deadlock.
---
 src/ntf/ntfd/ntfs_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/ntf/ntfd/ntfs_main.c b/src/ntf/ntfd/ntfs_main.c
index 55ecc9e51..d74ddc926 100644
--- a/src/ntf/ntfd/ntfs_main.c
+++ b/src/ntf/ntfd/ntfs_main.c
@@ -137,7 +137,10 @@ static void sigusr1_handler(int sig)
(void)sig;
signal(SIGUSR1, SIG_IGN);
ncs_sel_obj_ind(_sel_obj);
-   TRACE("Got USR1 signal");
+   /* Do not use trace in the signal handler
+* It can apparently cause the main thread
+* to spin or deadlock. See ticket #3245.
+*/
 }
 
 #if 0
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

2020-12-17 Thread thuan.tran
When see AMFD UP/NEW_ACTIVE in AMFD down state TRUE, AMFND should
send sync info if any assigned NCS SUs. After msg node_up acked,
resend buffered headless msg for NCS SUs.
---
 src/amf/amfnd/avnd_cb.h |  1 +
 src/amf/amfnd/avnd_di.h |  2 +-
 src/amf/amfnd/di.cc | 46 +++--
 src/amf/amfnd/main.cc   |  1 +
 src/amf/amfnd/susm.cc   |  1 +
 src/amf/amfnd/term.cc   |  2 +-
 6 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/src/amf/amfnd/avnd_cb.h b/src/amf/amfnd/avnd_cb.h
index a2e521359..8af5e5fe1 100644
--- a/src/amf/amfnd/avnd_cb.h
+++ b/src/amf/amfnd/avnd_cb.h
@@ -120,6 +120,7 @@ typedef struct avnd_cb_tag {
   bool reboot_in_progress;
   AVND_SU *failed_su;
   bool cont_reboot_in_progress;
+  bool is_ncs_su_assigned;
 
   /* the duration that amfnd should tolerate absence of any SC */
   SaTimeT scs_absence_max_duration;
diff --git a/src/amf/amfnd/avnd_di.h b/src/amf/amfnd/avnd_di.h
index 9870ad774..f9471aa6b 100644
--- a/src/amf/amfnd/avnd_di.h
+++ b/src/amf/amfnd/avnd_di.h
@@ -46,7 +46,7 @@ void avnd_di_msg_ack_process(struct avnd_cb_tag *, uint32_t);
 void avnd_diq_rec_check_buffered_msg(struct avnd_cb_tag *);
 AVND_DND_MSG_LIST *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG *msg);
 void avnd_diq_rec_del(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST *rec);
-void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb);
+void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb, bool only_ncs);
 uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST *rec);
 uint32_t avnd_di_reg_su_rsp_snd(struct avnd_cb_tag *cb,
 const std::string _name, uint32_t ret_code);
diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 5bff12104..20e752146 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -543,8 +543,7 @@ void avnd_send_node_up_msg(void) {
   msg.type = AVND_MSG_AVD;
   msg.info.avd->msg_type = AVSV_N2D_NODE_UP_MSG;
   msg.info.avd->msg_info.n2d_node_up.msg_id = ++(cb->snd_msg_id);
-  msg.info.avd->msg_info.n2d_node_up.leds_set =
-  cb->led_state == AVND_LED_STATE_GREEN ? true : false;
+  msg.info.avd->msg_info.n2d_node_up.leds_set = cb->is_ncs_su_assigned;
   osaf_extended_name_alloc(cb->amf_nodeName.c_str(),
>msg_info.n2d_node_up.node_name);
   msg.info.avd->msg_info.n2d_node_up.node_id = cb->node_info.nodeId;
@@ -652,7 +651,7 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * node_up in both cases but only sync info is sent for recovery
  */
 if (evt->info.mds.i_change == NCSMDS_UP) {
-  if (cb->is_avd_down && cb->led_state == AVND_LED_STATE_GREEN) {
+  if (cb->is_avd_down && cb->is_ncs_su_assigned) {
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
   }
@@ -665,7 +664,7 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * only want to send node_up/sync info in case of recovery.
  */
 if (evt->info.mds.i_change == NCSMDS_NEW_ACTIVE && cb->is_avd_down) {
-  if (cb->led_state == AVND_LED_STATE_GREEN) {
+  if (cb->is_ncs_su_assigned) {
 // node_up, sync sisu, compcsi info to AVND for recovery
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
@@ -1376,6 +1375,12 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t mid) {
 // then perform last step clean up
 avnd_stop_tmr(cb, >resp_tmr);
 avnd_last_step_clean(cb);
+  } else if (rec->msg.info.avd->msg_type == AVSV_N2D_NODE_UP_MSG) {
+TRACE("msg node_up acked");
+// Resend buffered headless msg for NCS SUs
+if (cb->is_ncs_su_assigned) {
+  avnd_diq_rec_send_buffered_msg(cb, true);
+}
   }
   TRACE("remove msg %u from queue", msg_id);
   avnd_diq_rec_del(cb, rec);
@@ -1541,15 +1546,17 @@ void avnd_diq_rec_del(AVND_CB *cb, AVND_DND_MSG_LIST 
*rec) {
   Description   : Resend buffered msg
 
   Arguments : cb  - ptr to the AvND control block
+  only_ncs  - only send msg for NCS SUs
 
   Return Values : None.
 
   Notes : None.
 **/
-void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) {
+void avnd_diq_rec_send_buffered_msg(AVND_CB *cb, bool only_ncs) {
   TRACE_ENTER();
   // Resend msgs from queue because amfnd dropped during headless
   // or headless-synchronization
+  std::vector tmp_dnd_list;
 
   for (auto iter = cb->dnd_list.begin(); iter != cb->dnd_list.end();) {
 auto pending_rec = *iter;
@@ -1564,6 +1571,10 @@ void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) {
   // only resend if this SUSI does exist
   AVND_SU *su = cb->sudb.find(Amf::to_string(
   _rec->msg.info.avd->msg_info.n2d_su_si_assign.su_name));
+  if (only_ncs && su && !su->is_ncs) {
+++iter;
+continue;
+  }
   if (su != nullptr && su->si_list.n_nodes > 0) {
 

[devel] [PATCH 0/1] Review Request for amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241] V4

2020-12-17 Thread thuan.tran
Summary: amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]
Review request for Ticket(s): 3241
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3241
Base revision: 4b49a94ce601d0f4a28209e11b7a2766ef294794
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 501241653d25bc2beffad7a25ea6a281d66c0c6f
Author: thuan.tran 
Date:   Fri, 18 Dec 2020 10:33:34 +0700

amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

When see AMFD UP/NEW_ACTIVE in AMFD down state TRUE, AMFND should
send sync info if any assigned NCS SUs. After msg node_up acked,
resend buffered headless msg for NCS SUs.



Complete diffstat:
--
 src/amf/amfnd/avnd_cb.h |  1 +
 src/amf/amfnd/avnd_di.h |  2 +-
 src/amf/amfnd/di.cc | 46 +++---
 src/amf/amfnd/main.cc   |  1 +
 src/amf/amfnd/susm.cc   |  1 +
 src/amf/amfnd/term.cc   |  2 +-
 6 files changed, 40 insertions(+), 13 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

2020-12-16 Thread thuan.tran
When see AMFD UP/NEW_ACTIVE in AMFD down state TRUE, AMFND should
send sync info if any assigned NCS SUs. After msg node_up acked,
resend buffered headless msg for NCS SUs.
---
 src/amf/amfnd/avnd_di.h |  3 +-
 src/amf/amfnd/di.cc | 70 ++---
 src/amf/amfnd/term.cc   |  2 +-
 3 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/src/amf/amfnd/avnd_di.h b/src/amf/amfnd/avnd_di.h
index 9870ad774..c4ba5a22d 100644
--- a/src/amf/amfnd/avnd_di.h
+++ b/src/amf/amfnd/avnd_di.h
@@ -46,7 +46,7 @@ void avnd_di_msg_ack_process(struct avnd_cb_tag *, uint32_t);
 void avnd_diq_rec_check_buffered_msg(struct avnd_cb_tag *);
 AVND_DND_MSG_LIST *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG *msg);
 void avnd_diq_rec_del(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST *rec);
-void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb);
+void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb, bool only_ncs);
 uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST *rec);
 uint32_t avnd_di_reg_su_rsp_snd(struct avnd_cb_tag *cb,
 const std::string _name, uint32_t ret_code);
@@ -56,6 +56,7 @@ uint32_t avnd_di_ack_nack_msg_send(struct avnd_cb_tag *cb, 
uint32_t rcv_id,
 extern void avnd_di_uns32_upd_send(int class_id, int attr_id,
const std::string , uint32_t value);
 extern uint32_t avnd_di_resend_pg_start_track(struct avnd_cb_tag *);
+bool any_assigned_ncs_su(struct avnd_cb_tag *cb);
 void avnd_sync_sisu(struct avnd_cb_tag *cb);
 void avnd_sync_csicomp(struct avnd_cb_tag *cb);
 
diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 5bff12104..ae56bfc5c 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -543,8 +543,7 @@ void avnd_send_node_up_msg(void) {
   msg.type = AVND_MSG_AVD;
   msg.info.avd->msg_type = AVSV_N2D_NODE_UP_MSG;
   msg.info.avd->msg_info.n2d_node_up.msg_id = ++(cb->snd_msg_id);
-  msg.info.avd->msg_info.n2d_node_up.leds_set =
-  cb->led_state == AVND_LED_STATE_GREEN ? true : false;
+  msg.info.avd->msg_info.n2d_node_up.leds_set = any_assigned_ncs_su(cb);
   osaf_extended_name_alloc(cb->amf_nodeName.c_str(),
>msg_info.n2d_node_up.node_name);
   msg.info.avd->msg_info.n2d_node_up.node_id = cb->node_info.nodeId;
@@ -652,7 +651,7 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * node_up in both cases but only sync info is sent for recovery
  */
 if (evt->info.mds.i_change == NCSMDS_UP) {
-  if (cb->is_avd_down && cb->led_state == AVND_LED_STATE_GREEN) {
+  if (cb->is_avd_down && any_assigned_ncs_su(cb)) {
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
   }
@@ -665,7 +664,7 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * only want to send node_up/sync info in case of recovery.
  */
 if (evt->info.mds.i_change == NCSMDS_NEW_ACTIVE && cb->is_avd_down) {
-  if (cb->led_state == AVND_LED_STATE_GREEN) {
+  if (any_assigned_ncs_su(cb)) {
 // node_up, sync sisu, compcsi info to AVND for recovery
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
@@ -1376,6 +1375,12 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t mid) {
 // then perform last step clean up
 avnd_stop_tmr(cb, >resp_tmr);
 avnd_last_step_clean(cb);
+  } else if (rec->msg.info.avd->msg_type == AVSV_N2D_NODE_UP_MSG) {
+TRACE("msg node_up acked");
+// Resend buffered headless msg for NCS SUs
+if (any_assigned_ncs_su(cb)) {
+  avnd_diq_rec_send_buffered_msg(cb, true);
+}
   }
   TRACE("remove msg %u from queue", msg_id);
   avnd_diq_rec_del(cb, rec);
@@ -1541,15 +1546,17 @@ void avnd_diq_rec_del(AVND_CB *cb, AVND_DND_MSG_LIST 
*rec) {
   Description   : Resend buffered msg
 
   Arguments : cb  - ptr to the AvND control block
+  only_ncs  - only send msg for NCS SUs
 
   Return Values : None.
 
   Notes : None.
 **/
-void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) {
+void avnd_diq_rec_send_buffered_msg(AVND_CB *cb, bool only_ncs) {
   TRACE_ENTER();
   // Resend msgs from queue because amfnd dropped during headless
   // or headless-synchronization
+  std::vector tmp_dnd_list;
 
   for (auto iter = cb->dnd_list.begin(); iter != cb->dnd_list.end();) {
 auto pending_rec = *iter;
@@ -1564,6 +1571,10 @@ void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) {
   // only resend if this SUSI does exist
   AVND_SU *su = cb->sudb.find(Amf::to_string(
   _rec->msg.info.avd->msg_info.n2d_su_si_assign.su_name));
+  if (only_ncs && su && !su->is_ncs) {
+++iter;
+continue;
+  }
   if (su != nullptr && su->si_list.n_nodes > 0) {
 pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
 ++(cb->snd_msg_id);
@@ -1586,12 +1597,18 

[devel] [PATCH 0/1] Review Request for amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241] V3

2020-12-16 Thread thuan.tran
Summary: amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]
Review request for Ticket(s): 3241
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3241
Base revision: b132d5d337b9ee1e3232376cd0666a4a49331460
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 9373614511d264b91a8b972077304a77b88b4797
Author: thuan.tran 
Date:   Wed, 16 Dec 2020 13:29:08 +0700

amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

When see AMFD UP/NEW_ACTIVE in AMFD down state TRUE, AMFND should
send sync info if any assigned NCS SUs. After msg node_up acked,
resend buffered headless msg for NCS SUs.



Complete diffstat:
--
 src/amf/amfnd/avnd_di.h |  3 ++-
 src/amf/amfnd/di.cc | 70 +
 src/amf/amfnd/term.cc   |  2 +-
 3 files changed, 62 insertions(+), 13 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix unexpected immnd restart after headless [#3244]

2020-12-14 Thread thuan.tran
Summary: imm: fix unexpected immnd restart after headless [#3244]
Review request for Ticket(s): 3244
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3244
Base revision: b132d5d337b9ee1e3232376cd0666a4a49331460
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 6450de8b3254e893017d4f59fbaf071c243cee84
Author: thuan.tran 
Date:   Tue, 15 Dec 2020 10:34:15 +0700

imm: fix unexpected immnd restart after headless [#3244]

Check FEVS counter to avoid unnecessary IMMND restart to sync
with Coord after headless



Complete diffstat:
--
 src/imm/immnd/immnd_evt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix unexpected immnd restart after headless [#3244]

2020-12-14 Thread thuan.tran
Check FEVS counter to avoid unnecessary IMMND restart to sync
with Coord after headless
---
 src/imm/immnd/immnd_evt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 670823a45..dfef6c0a5 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10542,8 +10542,10 @@ static uint32_t immnd_evt_proc_intro_rsp(IMMND_CB *cb, 
IMMND_EVT *evt,
   oldCanBeCoord);
}
if ((cb->mIntroduced == 2) && (!evt->info.ctrl.isCoord)) {
-   LOG_WA("Restart to sync with Coord! Exit");
-   exit(EXIT_SUCCESS);
+   if (evt->info.ctrl.fevsMsgStart != cb->highestReceived) 
{
+   LOG_WA("Restart to sync with Coord! Exit");
+   exit(EXIT_SUCCESS);
+   }
}
cb->mIntroduced = 1;
cb->mCanBeCoord = evt->info.ctrl.canBeCoord;
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix amfd crash when multi partitioned clusters rejoin [#3243]

2020-12-13 Thread thuan.tran
- Quick reboot is sometimes not quick cause RDE continue cause
split-brain detection for another SC. Need kill director services
to avoid impact other SCs.

- Active IMMD pause itself if see another active IMMD. Node will
reboot by RDE or split-brain timer of local IMMND.

- Improve log messages to avoid confusion about intro/re-intro
accept or just epoch update.
---
 scripts/opensaf_reboot  | 10 ++---
 src/imm/immd/immd_evt.c | 47 ++---
 2 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index 8e5bd8c40..5fbb1dd54 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -107,6 +107,13 @@ quick_local_node_reboot()
 {
logger -t "opensaf_reboot" "Do quick local node reboot"
 
+   for service in osafamfnd osafimmnd; do
+   $icmd pkill -STOP $service
+   done
+   for service in osafrded osafamfd osafimmd osaflogd osafntfd osafclmd; do
+   $icmd pkill -KILL $service
+   done
+
$icmd /bin/sh -c "/bin/echo -n 'b' 2> /dev/null > /proc/sysrq-trigger"
ret_code=$?
 
@@ -143,9 +150,6 @@ unset tipc
 # If clm cluster reboot requested argument one and two are set but not used,
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -KILL osafamfd
-   $icmd pkill -KILL osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 297761d13..eb579c489 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -785,13 +785,15 @@ static void immd_kill_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info)
 static uint16_t accepted_nodes = 0;
 
 static void immd_accept_node(IMMD_CB *cb, IMMD_IMMND_INFO_NODE *node_info,
-bool doReply, bool knownVeteran, bool 
check_ex_immd_node_id)
+bool doReply, bool knownVeteran, bool 
check_ex_immd)
 {
IMMSV_EVT accept_evt;
IMMD_MBCSV_MSG mbcp_msg;
bool isOnController = node_info->isOnController;
bool fsParamMbcp = false;
-   TRACE_ENTER();
+   TRACE_ENTER2(
+   "Accept IMMND %x doReply=%d knownVeteran=%d check_ex_immd=%d",
+   node_info->immnd_key, doReply, knownVeteran, check_ex_immd);
 
memset(_evt, 0, sizeof(IMMSV_EVT));
memset(_msg, 0, sizeof(IMMD_MBCSV_MSG));
@@ -799,9 +801,6 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
++accepted_nodes;
}
 
-   LOG_NO(
-   "Accept intro from %x with ex-IMMD %x",
-   node_info->immnd_key, node_info->ex_immd_node_id);
accept_evt.type = IMMSV_EVT_TYPE_IMMND;
accept_evt.info.immnd.type = IMMND_EVT_D2ND_INTRO_RSP;
accept_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
@@ -844,7 +843,7 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
cb->immnd_coord = node_info->immnd_key;
node_info->isCoord = true;
} else if (cb->mScAbsenceAllowed && doReply) {
-   if ((check_ex_immd_node_id) &&
+   if ((check_ex_immd) &&
(cb->node_id == node_info->immnd_key)) {
LOG_NO(
"IMMND re-introduce to IMMD on same 
this node. "
@@ -897,13 +896,13 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
LOG_NO(
"IMMND coord at %x with ex-IMMD %x",
node_info->immnd_key, node_info->ex_immd_node_id);
-   if (check_ex_immd_node_id && node_info->ex_immd_node_id)
+   if (check_ex_immd && node_info->ex_immd_node_id)
cb->ex_immd_node_id = node_info->ex_immd_node_id;
}
 
mbcp_msg.type = IMMD_A2S_MSG_INTRO_RSP; /* Mbcp intro to SBY. */
mbcp_msg.info.ctrl = accept_evt.info.immnd.info.ctrl;
-   if (check_ex_immd_node_id) {
+   if (check_ex_immd) {
mbcp_msg.type = IMMD_A2S_MSG_INTRO_RSP_2;
mbcp_msg.info.ctrl.ex_immd_node_id = node_info->ex_immd_node_id;
}
@@ -949,10 +948,19 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
.canBeCoord = IMMSV_VETERAN_COORD;
/* Allow all nodes including payloads to be coord */
 
-   if (check_ex_immd_node_id &&
-   !is_on_same_partition_with_coord(cb, node_info)) {
-   LOG_WA("Going to reboot node 0x%x", 
node_info->immnd_key);
-   accept_evt.info.immnd.info.ctrl.canBeCoord = 
IMMSV_UNKNOWN;
+  

[devel] [PATCH 0/1] Review Request for imm: fix amfd crash when multi partitioned clusters rejoin [#3243]

2020-12-13 Thread thuan.tran
Summary: imm: fix amfd crash when multi partitioned clusters rejoin [#3243]
Review request for Ticket(s): 3243
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3243
Base revision: b132d5d337b9ee1e3232376cd0666a4a49331460
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 109cb75c68399af613f4c5b9684e5c0d97222483
Author: thuan.tran 
Date:   Mon, 14 Dec 2020 10:45:15 +0700

imm: fix amfd crash when multi partitioned clusters rejoin [#3243]

- Quick reboot is sometimes not quick cause RDE continue cause
split-brain detection for another SC. Need kill director services
to avoid impact other SCs.

- Active IMMD pause itself if see another active IMMD. Node will
reboot by RDE or split-brain timer of local IMMND.

- Improve log messages to avoid confusion about intro/re-intro
accept or just epoch update.



Complete diffstat:
--
 scripts/opensaf_reboot  | 10 +++---
 src/imm/immd/immd_evt.c | 47 ++-
 2 files changed, 37 insertions(+), 20 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241] V2

2020-12-07 Thread thuan.tran
Summary: amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]
Review request for Ticket(s): 3241
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3241
Base revision: a6de68cb96c1423051ea5398856e80f325ef01ad
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision f9c3a7ac2b8cd1093624689482d7c95b6eb4ae10
Author: thuan.tran 
Date:   Tue, 8 Dec 2020 13:35:45 +0700

amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

AMFND should send sync state as after headless if see new active
AMFD from same node.



Complete diffstat:
--
 src/amf/amfnd/di.cc  | 3 ++-
 src/amf/amfnd/mds.cc | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

2020-12-07 Thread thuan.tran
AMFND should send sync state as after headless if see new active
AMFD from same node.
---
 src/amf/amfnd/di.cc  | 3 ++-
 src/amf/amfnd/mds.cc | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 5bff12104..507684eab 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -665,7 +665,8 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * only want to send node_up/sync info in case of recovery.
  */
 if (evt->info.mds.i_change == NCSMDS_NEW_ACTIVE && cb->is_avd_down) {
-  if (cb->led_state == AVND_LED_STATE_GREEN) {
+  if ((cb->led_state == AVND_LED_STATE_GREEN) ||
+  (evt->info.mds.node_id == cb->node_info.nodeId)) {
 // node_up, sync sisu, compcsi info to AVND for recovery
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
diff --git a/src/amf/amfnd/mds.cc b/src/amf/amfnd/mds.cc
index 86d207c29..89c75eaeb 100644
--- a/src/amf/amfnd/mds.cc
+++ b/src/amf/amfnd/mds.cc
@@ -531,6 +531,7 @@ uint32_t avnd_mds_svc_evt(AVND_CB *cb, 
MDS_CALLBACK_SVC_EVENT_INFO *evt_info) {
 evt = avnd_evt_create(cb, AVND_EVT_MDS_AVD_UP, 0, _info->i_dest, 0,
   0, 0);
 evt->info.mds.i_change = evt_info->i_change;
+evt->info.mds.node_id = evt_info->i_node_id;
   }
   break;
 case NCSMDS_UP:
-- 
2.25.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

2020-12-01 Thread thuan.tran
AMFND should send sync state as after headless if see new active
AMFD from same node.
---
 src/amf/amfnd/di.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 5bff12104..507684eab 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -665,7 +665,8 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  * only want to send node_up/sync info in case of recovery.
  */
 if (evt->info.mds.i_change == NCSMDS_NEW_ACTIVE && cb->is_avd_down) {
-  if (cb->led_state == AVND_LED_STATE_GREEN) {
+  if ((cb->led_state == AVND_LED_STATE_GREEN) ||
+  (evt->info.mds.node_id == cb->node_info.nodeId)) {
 // node_up, sync sisu, compcsi info to AVND for recovery
 avnd_sync_sisu(cb);
 avnd_sync_csicomp(cb);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

2020-12-01 Thread thuan.tran
Summary: amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]
Review request for Ticket(s): 3241
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3241
Base revision: c7446e40d15cb1ac75d82c45aeedb460b0ae10e3
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision fe295ee66048e8a9403d47715e0cda538d9c1c97
Author: thuan.tran 
Date:   Tue, 1 Dec 2020 17:09:00 +0700

amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]

AMFND should send sync state as after headless if see new active
AMFD from same node.



Complete diffstat:
--
 src/amf/amfnd/di.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix node reboot when cluster stop by clm admin op [#3240]

2020-12-01 Thread thuan.tran
AMFND should not reboot node if see AMFD crash during shutdown
---
 src/amf/amfnd/main.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/amf/amfnd/main.cc b/src/amf/amfnd/main.cc
index 6d9ee95d4..e0ede1161 100644
--- a/src/amf/amfnd/main.cc
+++ b/src/amf/amfnd/main.cc
@@ -608,12 +608,14 @@ void avnd_main_process(void) {
 }
 
 if (fds[FD_AMFD_FIFO].revents & POLLERR) {
-  LOG_ER("AMFD has unexpectedly crashed. Rebooting node");
-  opensaf_reboot(
-  avnd_cb->node_info.nodeId,
-  osaf_extended_name_borrow(_cb->node_info.executionEnvironment),
-  "AMFD has unexpectedly crashed. Rebooting node");
-  exit(0);
+  if (!m_AVND_IS_SHUTTING_DOWN(avnd_cb)) {
+LOG_ER("AMFD has unexpectedly crashed. Rebooting node");
+opensaf_reboot(
+avnd_cb->node_info.nodeId,
+
osaf_extended_name_borrow(_cb->node_info.executionEnvironment),
+"AMFD has unexpectedly crashed. Rebooting node");
+exit(0);
+  }
 }
 
 if (avnd_cb->clmHandle && (fds[FD_CLM].revents & POLLIN)) {
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix node reboot when cluster stop by clm admin op [#3240]

2020-12-01 Thread thuan.tran
Summary: amf: fix node reboot when cluster stop by clm admin op [#3240]
Review request for Ticket(s): 3240
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3240
Base revision: c7446e40d15cb1ac75d82c45aeedb460b0ae10e3
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision d54ee5bb465afc1024b320b5a4271afa181a864a
Author: thuan.tran 
Date:   Tue, 1 Dec 2020 16:51:59 +0700

amf: fix node reboot when cluster stop by clm admin op [#3240]

AMFND should not reboot node if see AMFD crash during shutdown



Complete diffstat:
--
 src/amf/amfnd/main.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix amfd stuck when multi partitioned clusters rejoin [#3237] V4

2020-11-26 Thread thuan.tran
Summary: imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]
Review request for Ticket(s): 3237
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3237
Base revision: 4bbe8939ee6be97524724dc9444eea78dcc6a470
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 186349b64e20220a161f6fba47ccfc02f70165bf
Author: thuan.tran 
Date:   Thu, 26 Nov 2020 15:32:30 +0700

imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- Update #3228 solution: active IMMD should not drop re-intro
from local IMMND, it causes unexpected IMMND coord selected then
local IMMND unexpected restart later.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.



Complete diffstat:
--
 scripts/opensaf_reboot |   5 +-
 src/imm/immd/immd_evt.c|  19 +--
 src/imm/immd/immd_mds.c|   1 +
 src/imm/immnd/immnd.h  |   1 +
 src/imm/immnd/immnd_cb.h   |   2 +
 src/imm/immnd/immnd_evt.c  | 122 +++--
 src/imm/immnd/immnd_main.c |   2 +
 7 files changed, 98 insertions(+), 54 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Yo

[devel] [PATCH 1/1] imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

2020-11-26 Thread thuan.tran
- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- Update #3228 solution: active IMMD should not drop re-intro
from local IMMND, it causes unexpected IMMND coord selected then
local IMMND unexpected restart later.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.
---
 scripts/opensaf_reboot |   5 +-
 src/imm/immd/immd_evt.c|  19 --
 src/imm/immd/immd_mds.c|   1 +
 src/imm/immnd/immnd.h  |   1 +
 src/imm/immnd/immnd_cb.h   |   2 +
 src/imm/immnd/immnd_evt.c  | 122 ++---
 src/imm/immnd/immnd_main.c |   2 +
 7 files changed, 98 insertions(+), 54 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index e2a0ca944..8e5bd8c40 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -143,8 +143,9 @@ unset tipc
 # If clm cluster reboot requested argument one and two are set but not used,
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osafimmd
+   $icmd pkill -STOP osafamfnd
+   $icmd pkill -KILL osafamfd
+   $icmd pkill -KILL osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 51cc8e4f7..297761d13 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -897,7 +897,8 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
LOG_NO(
"IMMND coord at %x with ex-IMMD %x",
node_info->immnd_key, node_info->ex_immd_node_id);
-   cb->ex_immd_node_id = node_info->ex_immd_node_id;
+   if (check_ex_immd_node_id && node_info->ex_immd_node_id)
+   cb->ex_immd_node_id = node_info->ex_immd_node_id;
}
 
mbcp_msg.type = IMMD_A2S_MSG_INTRO_RSP; /* Mbcp intro to SBY. */
@@ -1253,6 +1254,7 @@ static uint32_t immd_evt_proc_immnd_announce_sync(IMMD_CB 
*cb, IMMD_EVT *evt,
   Loop through all nodes */
 
cb->mRulingEpoch++;
+   cb->ex_immd_node_id = cb->node_id;
 
/*Only updates epoch for coord. */
/*node_info->epoch = cb->mRulingEpoch; */
@@ -1691,8 +1693,9 @@ static uint32_t immd_evt_proc_immnd_intro(IMMD_CB *cb, 
IMMD_EVT *evt,
 
immd_immnd_info_node_get(>immnd_tree, >dest, _info);
if (!node_info) {
-   if (evt->info.ctrl_msg.refresh == 3) {
-   LOG_WA("Drop re-intro from old IMMND dest %" PRIu64, 
sinfo->dest);
+   if ((evt->info.ctrl_msg.refresh == 3) &&
+   (sinfo->node_id != cb->node_id)) {
+   TRACE("Drop re-intro from old IMMND %x", 
sinfo->node_id);
goto done;
}
LOG_WA("Node not found dest %" PRIu64
@@ -3308,7 +3311,15 @@ static uint32_t immd_evt_proc_mds_evt(IMMD_CB *cb, 
IMMD_EVT *evt)
mds_info->dest);
goto done;
} else {
-   TRACE_5("IMMND DOWN PROCESS detected by IMMD");
+   if (node_info->immnd_execPid == 0) {
+   TRACE_5(
+   "Ignore IMMND %x DOWN not yet 
accepted intro",
+   node_info->immnd_key);
+   immd_immnd_info_node_delete(cb, 
node_info);
+   goto done;
+   }
+   TRACE_5("IMMND %x DOWN PROCESS detected by 
IMMD",
+   node_info->immnd_key);
immd_process_immnd_down(cb, node_info, true);
}
}
diff --git a/src/imm/immd/immd_mds.c b/src/imm/immd/immd_mds.c
index 7610a45fa..9688b49ad 100644
--- a/src/imm/immd/immd_mds.c
+++ b/src/imm/immd/immd_mds.c
@@ -495,6 +495,7 @@ static uint32_t immd_mds_rcv(IMMD_CB *cb, 
MDS_CALLBACK_RECEIVE_INFO *rcv_info)
pEvt->sinfo.ctxt = rcv_info->i_msg_ctxt;
pEvt->sinfo.dest = rcv_info->i_fr_dest;
pEvt->sinfo.to_svc = rcv_info->i_fr_svc_id;
+   pEvt->sinfo.node_id 

[devel] [PATCH 0/1] Review Request for imm: fix amfd stuck when multi partitioned clusters rejoin [#3237] V3

2020-11-24 Thread thuan.tran
Summary: imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]
Review request for Ticket(s): 3237
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3237
Base revision: 4bbe8939ee6be97524724dc9444eea78dcc6a470
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision bd69ad64a5c62cb1aa1d5fe6bf9e95a7bec9aeb7
Author: thuan.tran 
Date:   Wed, 25 Nov 2020 11:18:07 +0700

imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.



Complete diffstat:
--
 scripts/opensaf_reboot |   5 +-
 src/imm/immd/immd_evt.c|  16 --
 src/imm/immnd/immnd.h  |   1 +
 src/imm/immnd/immnd_cb.h   |   2 +
 src/imm/immnd/immnd_evt.c  | 122 +++--
 src/imm/immnd/immnd_main.c |   2 +
 6 files changed, 95 insertions(+), 53 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain

[devel] [PATCH 1/1] osaf: improve etcd3.plugin work with local etcd server [#3226]

2020-11-19 Thread thuan.tran
---
 src/osaf/consensus/plugins/etcd3.plugin | 48 ++---
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index 6252eedcb..11b426e19 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -33,30 +33,36 @@ export ETCDCTL_API=3
 # returns:
 #   0 - success,  is echoed to stdout
 #   1 - invalid param
+#   2 - context deadline exceeded
 #   other - failure
 get() {
   readonly key="$1"
 
-  if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" 2>&1)
-  then
-key_=$(echo "$output" | tail -n2 | head -n1)
-value=$(echo "$output" | tail -n1)
-if [ "$key_" = "$directory$key" ]; then
-  if [ "$key_" = "$value" ]; then
-# blank value returned
-echo ""
-return 0
+  while true; do
+if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" 2>&1)
+then
+  key_=$(echo "$output" | tail -n2 | head -n1)
+  value=$(echo "$output" | tail -n1)
+  if [ "$key_" = "$directory$key" ]; then
+if [ "$key_" = "$value" ]; then
+  # blank value returned
+  echo ""
+  return 0
+else
+  echo "$value"
+  return 0
+fi
   else
-echo "$value"
-return 0
+# key missing!
+return 1
   fi
-else
-  # key missing!
-  return 1
+elif echo $etcd_options | grep -q "endpoints=localhost" &&
+ echo $output | grep -q "Error: context deadline exceeded"; then
+  return 2
 fi
-  else
-return 2
-  fi
+break
+  done
+  return 3
 }
 
 # set
@@ -136,7 +142,7 @@ create_key() {
 return 0
   fi
 
-  if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" | tail -n1)
+  if output=$(get $key)
   then
 return 1
   else
@@ -243,7 +249,7 @@ lock() {
 return 0
   fi
 
-  current_owner=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$keyname" | tail -n1)
+  current_owner=$(get $keyname)
   # see if we already hold the lock
   if [ "$current_owner" = "$owner" ]; then
 return 0
@@ -354,7 +360,7 @@ watch() {
   result=$?
   if [ "$result" -gt 1 ]; then
 # etcd down?
-if [ "$watch_key" == "$takeover_request" ]; then
+if [ "$watch_key" == "$takeover_request" ] && [ "$result" -ne 2 ]; then
   hostname=`cat $node_name_file`
   echo "$hostname SC-0 1000 UNDEFINED"
   return 0
@@ -376,7 +382,7 @@ watch() {
 done
   else
 # etcd down?
-if [ "$watch_key" == "$takeover_request" ]; then
+if [ "$watch_key" == "$takeover_request" ] && [ "$result" -ne 2 ]; then
   hostname=`cat $node_name_file`
   echo "$hostname SC-0 1000 UNDEFINED"
   return 0
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for osaf: improve etcd3.plugin work with local etcd server [#3226]

2020-11-19 Thread thuan.tran
Summary: osaf: improve etcd3.plugin work with local etcd server [#3226]
Review request for Ticket(s): 3226
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3226
Base revision: c4091499e28980c732c8ac4136e10243617ac81d
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 1774b62183b0187eeda58166e5f5f5ca8c32cf61
Author: thuan.tran 
Date:   Thu, 19 Nov 2020 13:41:40 +0700

osaf: improve etcd3.plugin work with local etcd server [#3226]



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 48 ++---
 1 file changed, 27 insertions(+), 21 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

2020-11-17 Thread thuan.tran
- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.
---
 scripts/opensaf_reboot |  5 +++--
 src/imm/immd/immd_evt.c| 16 +---
 src/imm/immnd/immnd.h  |  1 +
 src/imm/immnd/immnd_cb.h   |  2 ++
 src/imm/immnd/immnd_evt.c  | 37 +
 src/imm/immnd/immnd_main.c |  2 ++
 6 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index e2a0ca944..8e5bd8c40 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -143,8 +143,9 @@ unset tipc
 # If clm cluster reboot requested argument one and two are set but not used,
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osafimmd
+   $icmd pkill -STOP osafamfnd
+   $icmd pkill -KILL osafamfd
+   $icmd pkill -KILL osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 51cc8e4f7..e5f438c1a 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -897,7 +897,8 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
LOG_NO(
"IMMND coord at %x with ex-IMMD %x",
node_info->immnd_key, node_info->ex_immd_node_id);
-   cb->ex_immd_node_id = node_info->ex_immd_node_id;
+   if (node_info->ex_immd_node_id)
+   cb->ex_immd_node_id = node_info->ex_immd_node_id;
}
 
mbcp_msg.type = IMMD_A2S_MSG_INTRO_RSP; /* Mbcp intro to SBY. */
@@ -1253,6 +1254,7 @@ static uint32_t immd_evt_proc_immnd_announce_sync(IMMD_CB 
*cb, IMMD_EVT *evt,
   Loop through all nodes */
 
cb->mRulingEpoch++;
+   cb->ex_immd_node_id = cb->node_id;
 
/*Only updates epoch for coord. */
/*node_info->epoch = cb->mRulingEpoch; */
@@ -1692,7 +1694,7 @@ static uint32_t immd_evt_proc_immnd_intro(IMMD_CB *cb, 
IMMD_EVT *evt,
immd_immnd_info_node_get(>immnd_tree, >dest, _info);
if (!node_info) {
if (evt->info.ctrl_msg.refresh == 3) {
-   LOG_WA("Drop re-intro from old IMMND dest %" PRIu64, 
sinfo->dest);
+   TRACE("Drop re-intro from old IMMND dest %" PRIu64, 
sinfo->dest);
goto done;
}
LOG_WA("Node not found dest %" PRIu64
@@ -3308,7 +3310,15 @@ static uint32_t immd_evt_proc_mds_evt(IMMD_CB *cb, 
IMMD_EVT *evt)
mds_info->dest);
goto done;
} else {
-   TRACE_5("IMMND DOWN PROCESS detected by IMMD");
+   if (node_info->immnd_execPid == 0) {
+   TRACE_5(
+   "Ignore IMMND %x DOWN not yet 
accepted intro",
+   node_info->immnd_key);
+   immd_immnd_info_node_delete(cb, 
node_info);
+   goto done;
+   }
+   TRACE_5("IMMND %x DOWN PROCESS detected by 
IMMD",
+   node_info->immnd_key);
immd_process_immnd_down(cb, node_info, true);
}
}
diff --git a/src/imm/immnd/immnd.h b/src/imm/immnd/immnd.h
index 7b0818de7..23edf004b 100644
--- a/src/imm/immnd/immnd.h
+++ b/src/imm/immnd/immnd.h
@@ -33,6 +33,7 @@
 #endif
 
 #include "imm/common/immsv.h"
+#include "base/ncssysf_tmr.h"
 #include "immnd_cb.h"
 #include "immnd_init.h"
 
diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index 3dc03d88b..bb3bb8493 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -207,6 +207,8 @@ typedef struct immnd_cb_tag {
   clm_init_sel_obj; /* Selection object wait for  clms intialization*/
   bool isClmNodeJoined; /* True => If clm joined the cluster*/
   NCS_PATRICIA_TREE immnd_clm_list; /* IMMND_IMM_CLIENT_NODE - node */
+  tmr_t splitbrain_tmr;
+  bool 

[devel] [PATCH 0/1] Review Request for imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

2020-11-17 Thread thuan.tran
Summary: imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]
Review request for Ticket(s): 3237
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3237
Base revision: c4091499e28980c732c8ac4136e10243617ac81d
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 4cadd605c98e2cf7a0e74c63017e82c66042891a
Author: thuan.tran 
Date:   Tue, 17 Nov 2020 16:28:11 +0700

imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.



Complete diffstat:
--
 scripts/opensaf_reboot |  5 +++--
 src/imm/immd/immd_evt.c| 17 ++---
 src/imm/immnd/immnd.h  |  1 +
 src/imm/immnd/immnd_cb.h   |  2 ++
 src/imm/immnd/immnd_evt.c  | 35 +++
 src/imm/immnd/immnd_main.c |  2 ++
 6 files changed, 49 insertions(+), 13 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain

[devel] [PATCH 1/1] imm: fix amfd stuck when multi partitioned clusters rejoin [#3237]

2020-11-17 Thread thuan.tran
- IMMND coordinator take longer time to sync because incorrectly
postpone sync to wait for incorrect number of down nodes.
- IMMND should restart after being accepted re-intro and not be
a new coordinator to sync again with new coordinator.
- Active IMMD only update ex-IMMD from coordinator if info exist.
Update ex-IMMD to node id itself when new coord announce sync.
- IMMND on active IMMD node will start split-brain detected timer
to reboot node if see another acitve IMMD, not reboot immedidately
to avoid messing up RDE split-brain detection mechanism.
- Quick reboot sometimes not quick then active IMMD on node may
impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD
to avoid any impact.
---
 scripts/opensaf_reboot |  5 +++--
 src/imm/immd/immd_evt.c| 17 ++---
 src/imm/immnd/immnd.h  |  1 +
 src/imm/immnd/immnd_cb.h   |  2 ++
 src/imm/immnd/immnd_evt.c  | 35 +++
 src/imm/immnd/immnd_main.c |  2 ++
 6 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index e2a0ca944..8e5bd8c40 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -143,8 +143,9 @@ unset tipc
 # If clm cluster reboot requested argument one and two are set but not used,
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osafimmd
+   $icmd pkill -STOP osafamfnd
+   $icmd pkill -KILL osafamfd
+   $icmd pkill -KILL osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 51cc8e4f7..c2a2ca2f9 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -897,7 +897,8 @@ static void immd_accept_node(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info,
LOG_NO(
"IMMND coord at %x with ex-IMMD %x",
node_info->immnd_key, node_info->ex_immd_node_id);
-   cb->ex_immd_node_id = node_info->ex_immd_node_id;
+   if (node_info->ex_immd_node_id)
+   cb->ex_immd_node_id = node_info->ex_immd_node_id;
}
 
mbcp_msg.type = IMMD_A2S_MSG_INTRO_RSP; /* Mbcp intro to SBY. */
@@ -1253,6 +1254,7 @@ static uint32_t immd_evt_proc_immnd_announce_sync(IMMD_CB 
*cb, IMMD_EVT *evt,
   Loop through all nodes */
 
cb->mRulingEpoch++;
+   cb->ex_immd_node_id = cb->node_id;
 
/*Only updates epoch for coord. */
/*node_info->epoch = cb->mRulingEpoch; */
@@ -1692,7 +1694,8 @@ static uint32_t immd_evt_proc_immnd_intro(IMMD_CB *cb, 
IMMD_EVT *evt,
immd_immnd_info_node_get(>immnd_tree, >dest, _info);
if (!node_info) {
if (evt->info.ctrl_msg.refresh == 3) {
-   LOG_WA("Drop re-intro from old IMMND dest %" PRIu64, 
sinfo->dest);
+   LOG_WA("Drop re-intro from old IMMND dest %llx",
+   (long long unsigned int)sinfo->dest);
goto done;
}
LOG_WA("Node not found dest %" PRIu64
@@ -3308,7 +3311,15 @@ static uint32_t immd_evt_proc_mds_evt(IMMD_CB *cb, 
IMMD_EVT *evt)
mds_info->dest);
goto done;
} else {
-   TRACE_5("IMMND DOWN PROCESS detected by IMMD");
+   if (node_info->immnd_execPid == 0) {
+   TRACE_5(
+   "Ignore IMMND %x DOWN not yet 
accepted intro",
+   node_info->immnd_key);
+   immd_immnd_info_node_delete(cb, 
node_info);
+   goto done;
+   }
+   TRACE_5("IMMND %x DOWN PROCESS detected by 
IMMD",
+   node_info->immnd_key);
immd_process_immnd_down(cb, node_info, true);
}
}
diff --git a/src/imm/immnd/immnd.h b/src/imm/immnd/immnd.h
index 7b0818de7..23edf004b 100644
--- a/src/imm/immnd/immnd.h
+++ b/src/imm/immnd/immnd.h
@@ -33,6 +33,7 @@
 #endif
 
 #include "imm/common/immsv.h"
+#include "base/ncssysf_tmr.h"
 #include "immnd_cb.h"
 #include "immnd_init.h"
 
diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index 3dc03d88b..bb3bb8493 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -207,6 +207,8 @@ typedef struct immnd_cb_tag {
   clm_init_sel_obj; /* Selection object wait for  clms intialization*/
   bool isClmNodeJoined; /* True => If clm joined the cluster*/
   NCS_PATRICIA_TREE immnd_clm_list; /* IMMND_IMM_CLIENT_NODE - node 

[devel] [PATCH 0/1] Review Request for amf: fix amfd crash in multi partitioned clusters rejoin [#3236]

2020-11-15 Thread thuan.tran
Summary: amf: fix amfd crash in multi partitioned clusters rejoin [#3236]
Review request for Ticket(s): 3236
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3236
Base revision: c539684e0f98b90ef65a031bff17864055fbfeb4
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision c4091499e28980c732c8ac4136e10243617ac81d
Author: thuan.tran 
Date:   Mon, 16 Nov 2020 08:17:31 +0700

amf: fix amfd crash in multi partitioned clusters rejoin [#3236]

AMFD recovery after headless should check if any SI sponsor
under failover to decide SG stable or not.



Complete diffstat:
--
 src/amf/amfd/sg.cc |  3 ++-
 src/amf/amfd/sgproc.cc |  3 +--
 src/amf/amfd/su.cc | 19 +++
 src/amf/amfd/su.h  |  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix amfd crash in multi partitioned clusters rejoin [#3236]

2020-11-15 Thread thuan.tran
AMFD recovery after headless should check if any SI sponsor
under failover to decide SG stable or not.
---
 src/amf/amfd/sg.cc |  3 ++-
 src/amf/amfd/sgproc.cc |  3 +--
 src/amf/amfd/su.cc | 19 +++
 src/amf/amfd/su.h  |  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/amf/amfd/sg.cc b/src/amf/amfd/sg.cc
index e2c2528c8..6ed585c49 100644
--- a/src/amf/amfd/sg.cc
+++ b/src/amf/amfd/sg.cc
@@ -2405,7 +2405,8 @@ bool AVD_SG::any_assignment_in_progress() {
   for (const auto  : list_of_su) {
 if (su->any_susi_fsm_in(AVD_SU_SI_STATE_ASGN) ||
 su->any_susi_fsm_in(AVD_SU_SI_STATE_UNASGN) ||
-su->any_susi_fsm_in(AVD_SU_SI_STATE_MODIFY)) {
+su->any_susi_fsm_in(AVD_SU_SI_STATE_MODIFY) ||
+su->any_sponsor_si_under_failover()) {
   pending = true;
   break;
 }
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 405e2c45d..7de64f4a8 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -2480,9 +2480,7 @@ uint32_t avd_sg_su_oper_list_add(AVD_CL_CB *cb, AVD_SU 
*su, bool ckpt,
   }
 
   TRACE("added %s to %s", su->name.c_str(), su->sg_of_su->name.c_str());
-
   su_oper_list.push_back(su);
-
   if (!ckpt) {
 // Update to IMM if headless is enabled
 if (cb->scs_absence_max_duration > 0 && wrt_to_imm) {
@@ -2539,6 +2537,7 @@ uint32_t avd_sg_su_oper_list_del(AVD_CL_CB *cb, AVD_SU 
*su, bool ckpt,
 goto done;
   }
 
+  TRACE("erased %s to %s", su->name.c_str(), su->sg_of_su->name.c_str());
   su_oper_list.erase(elem);
   if (!ckpt) {
 // Update to IMM if headless is enabled
diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc
index 5a6c69c33..1fe92c16b 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -2692,6 +2692,25 @@ bool AVD_SU::any_susi_fsm_in(uint32_t check_fsm) {
   TRACE_LEAVE();
   return rc;
 }
+/**
+ * @briefChecks if sponsor SI under failover
+ * @result   true/false
+ */
+bool AVD_SU::any_sponsor_si_under_failover() {
+  TRACE_ENTER2("SU:'%s'", name.c_str());
+  bool rc = false;
+  for (AVD_SU_SI_REL *susi = list_of_susi; susi && rc == false;
+   susi = susi->su_next) {
+TRACE("SUSI:'%s,%s', si_dep_state:'%d'", susi->su->name.c_str(),
+  susi->si->name.c_str(), susi->si->si_dep_state);
+if (susi->si->si_dep_state == AVD_SI_FAILOVER_UNDER_PROGRESS) {
+  rc = true;
+  TRACE("Found");
+}
+  }
+  TRACE_LEAVE();
+  return rc;
+}
 /**
  * @brief  Verify if SU is stable for admin operation on any higher
level enity like SG, Node and Nodegroup etc.
diff --git a/src/amf/amfd/su.h b/src/amf/amfd/su.h
index f32f3138a..3a6266c7c 100644
--- a/src/amf/amfd/su.h
+++ b/src/amf/amfd/su.h
@@ -146,6 +146,7 @@ class AVD_SU {
   void lock(SaImmOiHandleT immoi_handle, SaInvocationT invocation,
 SaAmfAdminStateT adm_state);
   bool any_susi_fsm_in(uint32_t check_fsm);
+  bool any_sponsor_si_under_failover();
   SaAisErrorT check_su_stability();
   uint32_t curr_num_standby_sis();
   uint32_t curr_num_active_sis();
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for log: fix memleak detected by valgrind [#3234]

2020-11-10 Thread thuan.tran
Summary: log: fix memleak detected by valgrind [#3234]
Review request for Ticket(s): 3234
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3234
Base revision: 47d18c7783e29b97e63081062158d537aaf84464
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 103e6e4aa526948fe4b657090b5147432e811e0b
Author: thuan.tran 
Date:   Wed, 11 Nov 2020 09:44:16 +0700

log: fix memleak detected by valgrind [#3234]



Complete diffstat:
--
 src/log/logd/lgs_mbcsv.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] log: fix memleak detected by valgrind [#3234]

2020-11-10 Thread thuan.tran
---
 src/log/logd/lgs_mbcsv.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/log/logd/lgs_mbcsv.cc b/src/log/logd/lgs_mbcsv.cc
index a38f7a5d1..ee14da2f0 100644
--- a/src/log/logd/lgs_mbcsv.cc
+++ b/src/log/logd/lgs_mbcsv.cc
@@ -88,7 +88,7 @@ static uint32_t ckpt_encode_cbk_handler(NCS_MBCSV_CB_ARG 
*cbk_arg);
 static uint32_t ckpt_enc_cold_sync_data(lgs_cb_t *lgs_cb,
 NCS_MBCSV_CB_ARG *cbk_arg,
 bool data_req);
-static uint32_t ckpt_encode_async_update(lgs_cb_t *lgs_cb, EDU_HDL edu_hdl,
+static uint32_t ckpt_encode_async_update(lgs_cb_t *lgs_cb,
  NCS_MBCSV_CB_ARG *cbk_arg);
 static uint32_t ckpt_decode_cold_sync(lgs_cb_t *cb, NCS_MBCSV_CB_ARG *cbk_arg);
 static uint32_t ckpt_peer_info_cbk_handler(NCS_MBCSV_CB_ARG *arg);
@@ -689,7 +689,7 @@ static uint32_t ckpt_encode_cbk_handler(NCS_MBCSV_CB_ARG 
*cbk_arg) {
   switch (cbk_arg->info.encode.io_msg_type) {
 case NCS_MBCSV_MSG_ASYNC_UPDATE:
   /* Encode async update */
-  if ((rc = ckpt_encode_async_update(lgs_cb, lgs_cb->edu_hdl, cbk_arg)) !=
+  if ((rc = ckpt_encode_async_update(lgs_cb, cbk_arg)) !=
   NCSCC_RC_SUCCESS)
 TRACE("  ckpt_encode_async_update FAILED");
   break;
@@ -1005,7 +1005,7 @@ static uint32_t edu_enc_reg_list(lgs_cb_t *cb, NCS_UBAID 
*uba) {
  * Notes : None.
  /
 
-static uint32_t ckpt_encode_async_update(lgs_cb_t *lgs_cb, EDU_HDL edu_hdl,
+static uint32_t ckpt_encode_async_update(lgs_cb_t *lgs_cb,
  NCS_MBCSV_CB_ARG *cbk_arg) {
   lgsv_ckpt_msg_v9_t *data_v9 = NULL;
   lgsv_ckpt_msg_v8_t *data_v8 = NULL;
@@ -1064,7 +1064,7 @@ static uint32_t ckpt_encode_async_update(lgs_cb_t 
*lgs_cb, EDU_HDL edu_hdl,
 return NCSCC_RC_FAILURE;
   }
   /* Encode async record,except publish & subscribe */
-  rc = m_NCS_EDU_EXEC(_hdl, edp_function, _arg->info.encode.io_uba,
+  rc = m_NCS_EDU_EXEC(_cb->edu_hdl, edp_function, 
_arg->info.encode.io_uba,
   EDP_OP_TYPE_ENC, vdata, );
 
   if (rc != NCSCC_RC_SUCCESS) {
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for fm: fix unexpected node reboot [#3230]

2020-11-04 Thread thuan.tran
Summary: fm: fix unexpected node reboot [#3230]
Review request for Ticket(s): 3230
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3230
Base revision: 9104a5597ffd0102a25c12bf8909d5060e551e34
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 8800be5af7eae6b5270c3a5584112227f596cb46
Author: thuan.tran 
Date:   Wed, 4 Nov 2020 17:36:49 +0700

fm: fix unexpected node reboot [#3230]

- Only reboot if RDE role is not ACTIVE because
there is a case that node just promote to ACTIVE.



Complete diffstat:
--
 src/fm/fmd/fm_main.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for ckpt: fix ckptnd crash in cpnd_ckpt_sc_cpnd_mdest_del [#3231]

2020-11-04 Thread thuan.tran
Summary: ckpt: fix ckptnd crash in cpnd_ckpt_sc_cpnd_mdest_del [#3231]
Review request for Ticket(s): 3231
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3231
Base revision: 9104a5597ffd0102a25c12bf8909d5060e551e34
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 2bb5f2448b8d25420481190eb09179e5f9b625b9
Author: thuan.tran 
Date:   Wed, 4 Nov 2020 17:32:02 +0700

ckpt: fix ckptnd crash in cpnd_ckpt_sc_cpnd_mdest_del [#3231]

- Correct next pointer in the loop



Complete diffstat:
--
 src/ckpt/ckptnd/cpnd_proc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix no expected reboot when multi partitioned clusters rejoin [#3229]

2020-10-19 Thread thuan.tran
- Let IMMND go into headless when it seen two active IMMDs
- Intialize ex-IMMD node id for IMMD
---
 src/imm/immd/immd_main.c  |  1 +
 src/imm/immnd/immnd_evt.c | 30 ++
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/src/imm/immd/immd_main.c b/src/imm/immd/immd_main.c
index f673d667b..1edca2232 100644
--- a/src/imm/immd/immd_main.c
+++ b/src/imm/immd/immd_main.c
@@ -259,6 +259,7 @@ uint32_t initialize_for_assignment(IMMD_CB *cb, 
SaAmfHAStateT ha_state)
}
 
 done:
+   cb->ex_immd_node_id = cb->node_id;
TRACE_LEAVE2("rc = %u", rc);
return rc;
 }
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 714a75ca2..e405d3ce4 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -12230,15 +12230,29 @@ static uint32_t immnd_evt_proc_mds_evt(IMMND_CB *cb, 
IMMND_EVT *evt)
 
/* In multi partitioned clusters rejoin, IMMND may not realize
 * headless due to see IMMDs from different partitions */
-   if ((evt->info.mds_info.change == NCSMDS_RED_UP) &&
+   if ((evt->info.mds_info.change == NCSMDS_DOWN) &&
+   (evt->info.mds_info.svc_id == NCSMDS_SVC_ID_IMMD)) {
+   is_headless = true;
+   cb->immd_node_id = 0;
+   cb->other_immd_id = 0;
+   } else if ((evt->info.mds_info.change == NCSMDS_RED_UP) &&
(evt->info.mds_info.svc_id == NCSMDS_SVC_ID_IMMD) &&
-   (evt->info.mds_info.node_id != cb->immd_node_id) &&
-   (evt->info.mds_info.role == V_DEST_RL_STANDBY) &&
-   (cb->other_immd_id == 0)) {
-   cb->other_immd_id = evt->info.mds_info.node_id;
-   TRACE_2("IMMD RED_UP EVENT %x role=%d ==> ACT:%x SBY:%x",
-   evt->info.mds_info.node_id, evt->info.mds_info.role,
-   cb->immd_node_id, cb->other_immd_id);
+   (evt->info.mds_info.node_id != cb->immd_node_id)) {
+   if ((evt->info.mds_info.role == V_DEST_RL_STANDBY) &&
+   (cb->other_immd_id == 0)) {
+   cb->other_immd_id = evt->info.mds_info.node_id;
+   TRACE_2("IMMD RED_UP EVENT %x role=%d ==> ACT:%x 
SBY:%x",
+   evt->info.mds_info.node_id, evt->info.mds_info.role,
+   cb->immd_node_id, cb->other_immd_id);
+   } else if ((evt->info.mds_info.role == V_DEST_RL_ACTIVE) &&
+   (cb->immd_node_id != 0) &&
+   (cb->node_id != cb->immd_node_id)) {
+   LOG_WA("See two Active IMMD: %x %x, going to headless",
+   cb->immd_node_id, evt->info.mds_info.node_id);
+   is_headless = true;
+   cb->immd_node_id = 0;
+   cb->other_immd_id = 0;
+   }
} else if ((evt->info.mds_info.change == NCSMDS_RED_DOWN) &&
   (evt->info.mds_info.svc_id == NCSMDS_SVC_ID_IMMD)) {
if (cb->immd_node_id == evt->info.mds_info.node_id)
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix no expected reboot when multi partitioned clusters rejoin [#3229]

2020-10-19 Thread thuan.tran
Summary: imm: fix no expected reboot when multi partitioned clusters rejoin 
[#3229]
Review request for Ticket(s): 3229
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3229
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision e8df0f9a96e368a7f6791ae7d7f1e7607f90226f
Author: thuan.tran 
Date:   Tue, 20 Oct 2020 10:51:20 +0700

imm: fix no expected reboot when multi partitioned clusters rejoin [#3229]

- Let IMMND go into headless when it seen two active IMMDs
- Intialize ex-IMMD node id for IMMD



Complete diffstat:
--
 src/imm/immd/immd_main.c  |  1 +
 src/imm/immnd/immnd_evt.c | 30 ++
 2 files changed, 23 insertions(+), 8 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]

2020-10-19 Thread thuan.tran
- si_dep_state is changed to "failover under progress" when lock active
dependent SU due to sponsor SU failover is also under going. When sponsor
ready, new active assignment for dependent SU done but SG alignment skip
almost steps due to incorrect si_dep_state. Then lock SU under this SG
keep return TRY_AGAIN forever.
- Set si_dep_state properly in new active assignment.
---
 src/amf/amfd/sg_2n_fsm.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index 525e30049..e3d970fa8 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -659,6 +659,8 @@ static bool avd_sg_2n_assign_act_si(AVD_CL_CB *cb, AVD_SG 
*sg, AVD_SU *su) {
   if (avd_new_assgn_susi(cb, su, i_si, SA_AMF_HA_ACTIVE, false,
  _susi) == NCSCC_RC_SUCCESS) {
 l_flag = true;
+if (i_si->si_dep_state == AVD_SI_FAILOVER_UNDER_PROGRESS)
+  avd_sidep_si_dep_state_set(i_si, AVD_SI_ASSIGNED);
   } else {
 LOG_ER("%s:%u: %s", __FILE__, __LINE__, i_si->name.c_str());
   }
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix lock SU operation keep return TRY_AGAIN forever [#3227] V2

2020-10-19 Thread thuan.tran
Summary: amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]
Review request for Ticket(s): 3227
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3227
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision fcf3f69bcabbe70cf29b3747e5fbfd406834a13b
Author: thuan.tran 
Date:   Mon, 19 Oct 2020 21:01:45 +0700

amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]

- si_dep_state is changed to "failover under progress" when lock active
dependent SU due to sponsor SU failover is also under going. When sponsor
ready, new active assignment for dependent SU done but SG alignment skip
almost steps due to incorrect si_dep_state. Then lock SU under this SG
keep return TRY_AGAIN forever.
- Set si_dep_state properly in new active assignment.



Complete diffstat:
--
 src/amf/amfd/sg_2n_fsm.cc | 2 ++
 1 file changed, 2 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: drop re-intro message from down IMMND [#3228]

2020-10-16 Thread thuan.tran
IMMD incorrect select fresh IMMND to be coordinator
because it accept re-intro message from down IMMND
---
 src/imm/immd/immd_evt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 8d789249d..51cc8e4f7 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -1691,6 +1691,10 @@ static uint32_t immd_evt_proc_immnd_intro(IMMD_CB *cb, 
IMMD_EVT *evt,
 
immd_immnd_info_node_get(>immnd_tree, >dest, _info);
if (!node_info) {
+   if (evt->info.ctrl_msg.refresh == 3) {
+   LOG_WA("Drop re-intro from old IMMND dest %" PRIu64, 
sinfo->dest);
+   goto done;
+   }
LOG_WA("Node not found dest %" PRIu64
   ", add the missing IMMND node",
   sinfo->dest);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: drop re-intro message from down IMMND [#3228]

2020-10-16 Thread thuan.tran
Summary: imm: drop re-intro message from down IMMND [#3228]
Review request for Ticket(s): 3228
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3228
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision b563ec10ed60e7fdac066c7987763b7f68143382
Author: thuan.tran 
Date:   Fri, 16 Oct 2020 15:08:00 +0700

imm: drop re-intro message from down IMMND [#3228]

IMMD incorrect select fresh IMMND to be coordinator
because it accept re-intro message from down IMMND



Complete diffstat:
--
 src/imm/immd/immd_evt.c | 4 
 1 file changed, 4 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]

2020-10-14 Thread thuan.tran
- si_dep_state is changed to "failover under progress" when lock active
dependent SU due to sponsor SU failover is also under going. When sponsor
ready, new active assignment for dependent SU done but SG alignment skip
almost steps due to incorrect si_dep_state. Then lock SU under this SG
keep return TRY_AGAIN forever.
- Set si_dep_state properly in new active assignment.
- Allow lock SU even SG tolerance timer is running.
---
 src/amf/amfd/sg_2n_fsm.cc | 2 ++
 src/amf/amfd/su.cc| 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index 525e30049..e3d970fa8 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -659,6 +659,8 @@ static bool avd_sg_2n_assign_act_si(AVD_CL_CB *cb, AVD_SG 
*sg, AVD_SU *su) {
   if (avd_new_assgn_susi(cb, su, i_si, SA_AMF_HA_ACTIVE, false,
  _susi) == NCSCC_RC_SUCCESS) {
 l_flag = true;
+if (i_si->si_dep_state == AVD_SI_FAILOVER_UNDER_PROGRESS)
+  avd_sidep_si_dep_state_set(i_si, AVD_SI_ASSIGNED);
   } else {
 LOG_ER("%s:%u: %s", __FILE__, __LINE__, i_si->name.c_str());
   }
diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc
index 5a6c69c33..a85f9d1c9 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -1394,7 +1394,8 @@ static void su_admin_op_cb(SaImmOiHandleT immoi_handle,
   }
   /* if Tolerance timer is running for any SI's withing this SG, then return
* SA_AIS_ERR_TRY_AGAIN */
-  if (sg_is_tolerance_timer_running_for_any_si(su->sg_of_su)) {
+  if ((op_id != SA_AMF_ADMIN_LOCK) &&
+  sg_is_tolerance_timer_running_for_any_si(su->sg_of_su)) {
 report_admin_op_error(
 immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
 "Tolerance timer is running for some of the SI's in the SG '%s', "
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]

2020-10-14 Thread thuan.tran
Summary: amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]
Review request for Ticket(s): 3227
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3227
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 0ba00837e4fbd68f677ef499120e3bf25e7dede1
Author: thuan.tran 
Date:   Thu, 15 Oct 2020 07:35:47 +0700

amf: fix lock SU operation keep return TRY_AGAIN forever [#3227]

- si_dep_state is changed to "failover under progress" when lock active
dependent SU due to sponsor SU failover is also under going. When sponsor
ready, new active assignment for dependent SU done but SG alignment skip
almost steps due to incorrect si_dep_state. Then lock SU under this SG
keep return TRY_AGAIN forever.
- Set si_dep_state properly in new active assignment.
- Allow lock SU even SG tolerance timer is running.



Complete diffstat:
--
 src/amf/amfd/sg_2n_fsm.cc | 2 ++
 src/amf/amfd/su.cc| 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] osaf: improve etcd3.plugin work with local etcd server [#3226]

2020-10-14 Thread thuan.tran
---
 src/osaf/consensus/plugins/etcd3.plugin | 44 ++---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index 6252eedcb..3784fa41d 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -37,26 +37,32 @@ export ETCDCTL_API=3
 get() {
   readonly key="$1"
 
-  if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" 2>&1)
-  then
-key_=$(echo "$output" | tail -n2 | head -n1)
-value=$(echo "$output" | tail -n1)
-if [ "$key_" = "$directory$key" ]; then
-  if [ "$key_" = "$value" ]; then
-# blank value returned
-echo ""
-return 0
+  while true; do
+if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" 2>&1)
+then
+  key_=$(echo "$output" | tail -n2 | head -n1)
+  value=$(echo "$output" | tail -n1)
+  if [ "$key_" = "$directory$key" ]; then
+if [ "$key_" = "$value" ]; then
+  # blank value returned
+  echo ""
+  return 0
+else
+  echo "$value"
+  return 0
+fi
   else
-echo "$value"
-return 0
+# key missing!
+return 1
   fi
-else
-  # key missing!
-  return 1
+elif echo $etcd_options | grep -q "endpoints=localhost" &&
+ echo $output | grep -q "Error: context deadline exceeded"; then
+  sleep 1
+  continue
 fi
-  else
-return 2
-  fi
+break
+  done
+  return 2
 }
 
 # set
@@ -136,7 +142,7 @@ create_key() {
 return 0
   fi
 
-  if output=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$key" | tail -n1)
+  if output=$(get $key)
   then
 return 1
   else
@@ -243,7 +249,7 @@ lock() {
 return 0
   fi
 
-  current_owner=$(etcdctl $etcd_options --dial-timeout $etcd_timeout get 
"$directory$keyname" | tail -n1)
+  current_owner=$(get $keyname)
   # see if we already hold the lock
   if [ "$current_owner" = "$owner" ]; then
 return 0
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for osaf: improve etcd3.plugin work with local etcd server [#3226]

2020-10-14 Thread thuan.tran
Summary: osaf: improve etcd3.plugin work with local etcd server [#3226]
Review request for Ticket(s): 3226
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3226
Base revision: fa78173f280133ceb47224bfbaf9e83b96873fc5
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 73ae5b2d5fa4bc9d85511933836f8def5b313c38
Author: thuan.tran 
Date:   Thu, 15 Oct 2020 07:37:48 +0700

osaf: improve etcd3.plugin work with local etcd server [#3226]



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 44 +++--
 1 file changed, 25 insertions(+), 19 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for smf: improve admin operation from serial to parallel [#3221] V3

2020-09-24 Thread thuan.tran
Summary: smf: improve admin operation from serial to parallel [#3221]
Review request for Ticket(s): 3221
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3221
Base revision: ad6888c92012357751bb35ddcf41a255c152e03c
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 3e5e87ae9f2e367f90e084dd87a1b23612c8b288
Author: thuan.tran 
Date:   Thu, 24 Sep 2020 15:39:37 +0700

smf: improve admin operation from serial to parallel [#3221]

- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.



Complete diffstat:
--
 src/smf/smfd/SmfAdminState.cc |  70 +++---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 161 ++
 src/smf/smfd/SmfUtils.h   |  37 ++
 4 files changed, 232 insertions(+), 43 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] smf: improve admin operation from serial to parallel [#3221]

2020-09-24 Thread thuan.tran
- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.
---
 src/smf/smfd/SmfAdminState.cc |  70 ---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 161 +++---
 src/smf/smfd/SmfUtils.h   |  37 
 4 files changed, 232 insertions(+), 43 deletions(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
index 473021521..bc2bb1b85 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -511,8 +511,8 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
  saf_error(errno_));
 }
   } else if (nodeList_.size() == 1) {
-TRACE("%s: Use serialized for one node", __FUNCTION__);
-rc = adminOperationSerialized(toState, nodeList_);
+TRACE("%s: admin op for one node", __FUNCTION__);
+rc = adminOperationParallel(toState, nodeList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateNode() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -521,9 +521,9 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
 
   // Set admin state for SUs
   if ((rc == true) && (!suList_.empty())) {
-TRACE("%s: Use serialized for SUs", __FUNCTION__);
+TRACE("%s: admin op for SUs", __FUNCTION__);
 // Do only if setting admin state for nodes did not fail
-rc = adminOperationSerialized(toState, suList_);
+rc = adminOperationParallel(toState, suList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateSUs() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -576,22 +576,54 @@ bool SmfAdminStateHandler::adminOperationNodeGroup(
 // Set given admin state to all units in the given unitList
 // Return false if Fail. errno_ is set
 //
-bool SmfAdminStateHandler::adminOperationSerialized(
+bool SmfAdminStateHandler::adminOperationParallel(
 SaAmfAdminOperationIdT adminState,
 const std::list _unitList) {
   bool rc = true;
-  errno_ = SA_AIS_OK;
 
   TRACE_ENTER();
 
+  timespec now = base::ReadMonotonicClock();
+  timespec timeout = now + base::NanosToTimespec(smfd_cb->adminOpTimeout);
+
   if (!i_unitList.empty()) {
-for (auto  : i_unitList) {
-  rc = adminOperation(adminState, unit.name);
-  if (rc == false) {
-// Failed to set admin state
-break;
+for (auto  : i_unitList)
+  adminOperationAsync(adminState, unit.name);
+
+bool adminFail = false;
+while (base::ReadMonotonicClock() < timeout) {
+  auto it = adminAsyncList_.begin();
+  while (it != adminAsyncList_.end()) {
+if ((*it)->isAdminAsyncDone()) {
+  if ((*it)->getAdminAsyncResult() != SA_AIS_OK) {
+LOG_ER("Admin op FAIL %s for %s",
+   saf_error((*it)->getAdminAsyncResult()),
+   (*it)->getAdminAsyncObject().c_str());
+adminFail = true;
+rc = false;
+break;
+  } else {
+delete (*it);
+it = adminAsyncList_.erase(it);
+continue;
+  }
+}
+it++;
   }
+  if (adminFail) break;  // one admin operation fail
+  if (adminAsyncList_.empty()) break;  // all admin operations done OK
+  sleep(1);
 }
+
+for (auto  : adminAsyncList_) {
+  if (adminFail == false) {
+LOG_ER("Admin op TIMEOUT for %s",
+  immUtil->getAdminAsyncObject().c_str());
+rc = false;
+  }
+  delete immUtil;
+}
+adminAsyncList_.clear();
   }
 
   TRACE_LEAVE();
@@ -667,6 +699,8 @@ SaAmfAdminStateT SmfAdminStateHandler::getAdminState(
   SaAmfAdminStateT adminState = SA_AMF_ADMIN_UNLOCKED;
   if (p_adminState != nullptr) adminState = *p_adminState;
 
+  immutil_saImmOmAccessorFinalize(accessorHandle_);
+
   TRACE_LEAVE();
   return adminState;
 }
@@ -972,3 +1006,17 @@ bool 
SmfAdminStateHandler::adminOperation(SaAmfAdminOperationIdT adminOperation,
   TRACE_LEAVE2("%s", rc ? "OK" : "FAIL");
   return rc;
 }
+
+///
+/// Set given admin state (Async) to one unit
+///
+void SmfAdminStateHandler::adminOperationAsync(
+  SaAmfAdminOperationIdT adminOperation, const std::string ) {
+  TRACE_ENTER();
+  SmfImmUtils *immUtil = new SmfImmUtils();
+  TRACE("\t Unit name '%s', adminOperation=%d", unitName.c_str(),
+adminOperation);
+  immUtil->callAdminOperationAsync(unitName, adminOperation);
+  adminAsyncList_.push_back(immUtil);
+  TRACE_LEAVE();
+}
diff --git a/src/smf/smfd/SmfAdminState.h b/src/smf/smfd/SmfAdminState.h
index 6d6836df7..da518ce57 100644
--- a/src/smf/smfd/SmfAdminState.h
+++ b/src/smf/smfd/SmfAdminState.h
@@ -84,11 +84,13 @@ class SmfAdminStateHandler {
   bool adminOperationNodeGroup(SaAmfAdminStateT fromState,
  

[devel] [PATCH 0/1] Review Request for smf: improve admin operation from serial to parallel [#3221] V2

2020-09-22 Thread thuan.tran
Summary: smf: improve admin operation from serial to parallel [#3221]
Review request for Ticket(s): 3221
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3221
Base revision: ad6888c92012357751bb35ddcf41a255c152e03c
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 3b06602d7ebd4c01d0883bf276fbaa01f447305d
Author: thuan.tran 
Date:   Wed, 23 Sep 2020 06:30:35 +0700

smf: improve admin operation from serial to parallel [#3221]

- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.



Complete diffstat:
--
 src/smf/smfd/SmfAdminState.cc |  68 +++---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 161 ++
 src/smf/smfd/SmfUtils.h   |  37 ++
 4 files changed, 230 insertions(+), 43 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] smf: improve admin operation from serial to parallel [#3221]

2020-09-22 Thread thuan.tran
- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.
---
 src/smf/smfd/SmfAdminState.cc |  68 +++---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 161 +++---
 src/smf/smfd/SmfUtils.h   |  37 
 4 files changed, 230 insertions(+), 43 deletions(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
index 473021521..3a9cbd490 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -511,8 +511,8 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
  saf_error(errno_));
 }
   } else if (nodeList_.size() == 1) {
-TRACE("%s: Use serialized for one node", __FUNCTION__);
-rc = adminOperationSerialized(toState, nodeList_);
+TRACE("%s: admin op for one node", __FUNCTION__);
+rc = adminOperationParallel(toState, nodeList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateNode() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -521,9 +521,9 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
 
   // Set admin state for SUs
   if ((rc == true) && (!suList_.empty())) {
-TRACE("%s: Use serialized for SUs", __FUNCTION__);
+TRACE("%s: admin op for SUs", __FUNCTION__);
 // Do only if setting admin state for nodes did not fail
-rc = adminOperationSerialized(toState, suList_);
+rc = adminOperationParallel(toState, suList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateSUs() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -576,22 +576,52 @@ bool SmfAdminStateHandler::adminOperationNodeGroup(
 // Set given admin state to all units in the given unitList
 // Return false if Fail. errno_ is set
 //
-bool SmfAdminStateHandler::adminOperationSerialized(
+bool SmfAdminStateHandler::adminOperationParallel(
 SaAmfAdminOperationIdT adminState,
 const std::list _unitList) {
   bool rc = true;
-  errno_ = SA_AIS_OK;
 
   TRACE_ENTER();
 
+  timespec now = base::ReadMonotonicClock();
+  timespec timeout = now + base::NanosToTimespec(smfd_cb->adminOpTimeout);
+
   if (!i_unitList.empty()) {
-for (auto  : i_unitList) {
-  rc = adminOperation(adminState, unit.name);
-  if (rc == false) {
-// Failed to set admin state
-break;
+for (auto  : i_unitList)
+  adminOperationAsync(adminState, unit.name);
+
+bool adminOngoing = true;
+while (base::ReadMonotonicClock() < timeout) {
+  adminOngoing = false;
+  for (auto  : adminAsyncList_) {
+if (immUtil->isAdminAsyncDone()) {
+  if (immUtil->getAdminAsyncResult() != SA_AIS_OK) {
+LOG_ER("Admin op FAIL %s for %s",
+   saf_error(immUtil->getAdminAsyncResult()),
+   immUtil->getAdminAsyncObject().c_str());
+rc = false;
+adminOngoing = false;
+break;
+  }
+} else {
+  adminOngoing = true;
+}
   }
+  if (rc == false) break;  // one admin operation fail
+  if (!adminOngoing) break;  // all admin operations done OK
+  sleep(1);
 }
+
+for (auto  : adminAsyncList_) {
+  if (adminOngoing && !immUtil->isAdminAsyncDone()) {
+// At least one admin timeout
+LOG_ER("Admin op TIMEOUT for %s",
+  immUtil->getAdminAsyncObject().c_str());
+rc = false;
+  }
+  delete immUtil;
+}
+adminAsyncList_.clear();
   }
 
   TRACE_LEAVE();
@@ -667,6 +697,8 @@ SaAmfAdminStateT SmfAdminStateHandler::getAdminState(
   SaAmfAdminStateT adminState = SA_AMF_ADMIN_UNLOCKED;
   if (p_adminState != nullptr) adminState = *p_adminState;
 
+  immutil_saImmOmAccessorFinalize(accessorHandle_);
+
   TRACE_LEAVE();
   return adminState;
 }
@@ -972,3 +1004,17 @@ bool 
SmfAdminStateHandler::adminOperation(SaAmfAdminOperationIdT adminOperation,
   TRACE_LEAVE2("%s", rc ? "OK" : "FAIL");
   return rc;
 }
+
+///
+/// Set given admin state (Async) to one unit
+///
+void SmfAdminStateHandler::adminOperationAsync(
+  SaAmfAdminOperationIdT adminOperation, const std::string ) {
+  TRACE_ENTER();
+  SmfImmUtils *immUtil = new SmfImmUtils();
+  TRACE("\t Unit name '%s', adminOperation=%d", unitName.c_str(),
+adminOperation);
+  immUtil->callAdminOperationAsync(unitName, adminOperation);
+  adminAsyncList_.push_back(immUtil);
+  TRACE_LEAVE();
+}
diff --git a/src/smf/smfd/SmfAdminState.h b/src/smf/smfd/SmfAdminState.h
index 6d6836df7..da518ce57 100644
--- a/src/smf/smfd/SmfAdminState.h
+++ b/src/smf/smfd/SmfAdminState.h
@@ -84,11 +84,13 @@ class SmfAdminStateHandler {
   bool adminOperationNodeGroup(SaAmfAdminStateT fromState,
  

[devel] [PATCH 1/1] smf: improve admin operation from serial to parallel [#3221]

2020-09-22 Thread thuan.tran
- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.
---
 src/smf/smfd/SmfAdminState.cc |  68 ---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 156 +++---
 src/smf/smfd/SmfUtils.h   |  37 
 4 files changed, 225 insertions(+), 43 deletions(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
index 473021521..3a9cbd490 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -511,8 +511,8 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
  saf_error(errno_));
 }
   } else if (nodeList_.size() == 1) {
-TRACE("%s: Use serialized for one node", __FUNCTION__);
-rc = adminOperationSerialized(toState, nodeList_);
+TRACE("%s: admin op for one node", __FUNCTION__);
+rc = adminOperationParallel(toState, nodeList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateNode() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -521,9 +521,9 @@ bool 
SmfAdminStateHandler::changeAdminState(SaAmfAdminStateT fromState,
 
   // Set admin state for SUs
   if ((rc == true) && (!suList_.empty())) {
-TRACE("%s: Use serialized for SUs", __FUNCTION__);
+TRACE("%s: admin op for SUs", __FUNCTION__);
 // Do only if setting admin state for nodes did not fail
-rc = adminOperationSerialized(toState, suList_);
+rc = adminOperationParallel(toState, suList_);
 if (rc == false) {
   LOG_NO("%s: setAdminStateSUs() Fail %s", __FUNCTION__,
  saf_error(errno_));
@@ -576,22 +576,52 @@ bool SmfAdminStateHandler::adminOperationNodeGroup(
 // Set given admin state to all units in the given unitList
 // Return false if Fail. errno_ is set
 //
-bool SmfAdminStateHandler::adminOperationSerialized(
+bool SmfAdminStateHandler::adminOperationParallel(
 SaAmfAdminOperationIdT adminState,
 const std::list _unitList) {
   bool rc = true;
-  errno_ = SA_AIS_OK;
 
   TRACE_ENTER();
 
+  timespec now = base::ReadMonotonicClock();
+  timespec timeout = now + base::NanosToTimespec(smfd_cb->adminOpTimeout);
+
   if (!i_unitList.empty()) {
-for (auto  : i_unitList) {
-  rc = adminOperation(adminState, unit.name);
-  if (rc == false) {
-// Failed to set admin state
-break;
+for (auto  : i_unitList)
+  adminOperationAsync(adminState, unit.name);
+
+bool adminOngoing = true;
+while (base::ReadMonotonicClock() < timeout) {
+  adminOngoing = false;
+  for (auto  : adminAsyncList_) {
+if (immUtil->isAdminAsyncDone()) {
+  if (immUtil->getAdminAsyncResult() != SA_AIS_OK) {
+LOG_ER("Admin op FAIL %s for %s",
+   saf_error(immUtil->getAdminAsyncResult()),
+   immUtil->getAdminAsyncObject().c_str());
+rc = false;
+adminOngoing = false;
+break;
+  }
+} else {
+  adminOngoing = true;
+}
   }
+  if (rc == false) break;  // one admin operation fail
+  if (!adminOngoing) break;  // all admin operations done OK
+  sleep(1);
 }
+
+for (auto  : adminAsyncList_) {
+  if (adminOngoing && !immUtil->isAdminAsyncDone()) {
+// At least one admin timeout
+LOG_ER("Admin op TIMEOUT for %s",
+  immUtil->getAdminAsyncObject().c_str());
+rc = false;
+  }
+  delete immUtil;
+}
+adminAsyncList_.clear();
   }
 
   TRACE_LEAVE();
@@ -667,6 +697,8 @@ SaAmfAdminStateT SmfAdminStateHandler::getAdminState(
   SaAmfAdminStateT adminState = SA_AMF_ADMIN_UNLOCKED;
   if (p_adminState != nullptr) adminState = *p_adminState;
 
+  immutil_saImmOmAccessorFinalize(accessorHandle_);
+
   TRACE_LEAVE();
   return adminState;
 }
@@ -972,3 +1004,17 @@ bool 
SmfAdminStateHandler::adminOperation(SaAmfAdminOperationIdT adminOperation,
   TRACE_LEAVE2("%s", rc ? "OK" : "FAIL");
   return rc;
 }
+
+///
+/// Set given admin state (Async) to one unit
+///
+void SmfAdminStateHandler::adminOperationAsync(
+  SaAmfAdminOperationIdT adminOperation, const std::string ) {
+  TRACE_ENTER();
+  SmfImmUtils *immUtil = new SmfImmUtils();
+  TRACE("\t Unit name '%s', adminOperation=%d", unitName.c_str(),
+adminOperation);
+  immUtil->callAdminOperationAsync(unitName, adminOperation);
+  adminAsyncList_.push_back(immUtil);
+  TRACE_LEAVE();
+}
diff --git a/src/smf/smfd/SmfAdminState.h b/src/smf/smfd/SmfAdminState.h
index 6d6836df7..da518ce57 100644
--- a/src/smf/smfd/SmfAdminState.h
+++ b/src/smf/smfd/SmfAdminState.h
@@ -84,11 +84,13 @@ class SmfAdminStateHandler {
   bool adminOperationNodeGroup(SaAmfAdminStateT fromState,
  

[devel] [PATCH 0/1] Review Request for smf: improve admin operation from serial to parallel [#3221]

2020-09-22 Thread thuan.tran
Summary: smf: improve admin operation from serial to parallel [#3221]
Review request for Ticket(s): 3221
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3221
Base revision: ad6888c92012357751bb35ddcf41a255c152e03c
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision b324c001135901ee0d9443f481789d4b7a3d0826
Author: thuan.tran 
Date:   Tue, 22 Sep 2020 15:06:50 +0700

smf: improve admin operation from serial to parallel [#3221]

- In one-step upgrade with new install applications, SMF invoke
admin operation (sync mode) serially. This way may stuck due to
SU dependencies cause upgrade fail in the end as admin operation
timeout.
- Improve admin operation from serial to parallel to avoid SU
dependencies issue, also help speed up upgrade time.



Complete diffstat:
--
 src/smf/smfd/SmfAdminState.cc |  68 +++---
 src/smf/smfd/SmfAdminState.h  |   7 +-
 src/smf/smfd/SmfUtils.cc  | 156 ++
 src/smf/smfd/SmfUtils.h   |  37 ++
 4 files changed, 225 insertions(+), 43 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for ntf: fix memleak detected by valgrind [#3220]

2020-09-16 Thread thuan.tran
Summary: ntf: fix memleak detected by valgrind [#3220]
Review request for Ticket(s): 3220
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3220
Base revision: 0e1a6847c264ad5e34ca8413307b118066ae03eb
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 9fa8d856a26a77ed3df5a1e60e268071ab1b9418
Author: thuan.tran 
Date:   Thu, 17 Sep 2020 11:51:18 +0700

ntf: fix memleak detected by valgrind [#3220]

Solution #3215 mistakenly remove free() which cause this memleak



Complete diffstat:
--
 src/ntf/ntfimcnd/ntfimcn_imm.c | 1 +
 1 file changed, 1 insertion(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] ntf: fix memleak detected by valgrind [#3220]

2020-09-16 Thread thuan.tran
Solution #3215 mistakenly remove free() which cause this memleak
---
 src/ntf/ntfimcnd/ntfimcn_imm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/ntf/ntfimcnd/ntfimcn_imm.c b/src/ntf/ntfimcnd/ntfimcn_imm.c
index deb75a072..8d73fb35a 100644
--- a/src/ntf/ntfimcnd/ntfimcn_imm.c
+++ b/src/ntf/ntfimcnd/ntfimcn_imm.c
@@ -287,6 +287,7 @@ static void free_ccb_data(CcbUtilCcbData_t *ccb_data) {
if (ccb_data != NULL) {
if (ccb_data->userData != NULL) {
osaf_extended_name_free(ccb_data->userData);
+   free(ccb_data->userData);
}
ccbutil_deleteCcbData(ccb_data);
}
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

2020-09-16 Thread thuan.tran
- immnd prioritize re-introduce rsp from immd.
- immnd ignore broadcast events from IMMD if re-introduce on-going.
---
 src/imm/immnd/immnd_evt.c | 21 +++--
 src/imm/immnd/immnd_mds.c |  5 -
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index afc2106a0..714a75ca2 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -625,6 +625,21 @@ void immnd_process_evt(void)
return;
}
 
+   if ((cb->mIntroduced == 2) &&
+   ((evt->info.immnd.type == IMMND_EVT_D2ND_SYNC_START) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_SYNC_ABORT) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_PBE_PRTO_PURGE_MUTATIONS) 
||
+(evt->info.immnd.type == IMMND_EVT_D2ND_DUMP_OK) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_LOADING_OK) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_GLOB_FEVS_REQ) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_GLOB_FEVS_REQ_2))) {
+   LOG_WA("DISCARD message %s from IMMD %x as re-intro on-going",
+   immsv_get_immnd_evt_name(evt->info.immnd.type),
+   evt->sinfo.node_id);
+   immnd_evt_destroy(evt, true, __LINE__);
+   return;
+   }
+
if ((evt->info.immnd.type != IMMND_EVT_D2ND_GLOB_FEVS_REQ) &&
(evt->info.immnd.type != IMMND_EVT_D2ND_GLOB_FEVS_REQ_2))
immsv_msg_trace_rec(evt->sinfo.dest, evt);
@@ -10779,12 +10794,6 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, 
IMMND_EVT *evt,
 : false;
TRACE_ENTER();
 
-   if (cb->mIntroduced == 2) {
-   LOG_WA("DISCARD FEVS message:%llu from %x", msgNo, 
sinfo->node_id);
-   dequeue_outgoing(cb);
-   return NCSCC_RC_FAILURE;
-   }
-
if (cb->highestProcessed >= msgNo) {
/*We have already received this message, discard it. */
LOG_WA(
diff --git a/src/imm/immnd/immnd_mds.c b/src/imm/immnd/immnd_mds.c
index 02cb4b552..d9cccd5d9 100644
--- a/src/imm/immnd/immnd_mds.c
+++ b/src/imm/immnd/immnd_mds.c
@@ -552,7 +552,10 @@ static uint32_t immnd_mds_rcv(IMMND_CB *cb, 
MDS_CALLBACK_RECEIVE_INFO *rcv_info)
}
 
/* Put it in IMMND's Event Queue */
-   if (pEvt->info.immnd.type == IMMND_EVT_A2ND_IMM_INIT)
+   if (pEvt->info.immnd.type == IMMND_EVT_D2ND_INTRO_RSP)
+   rc = m_NCS_IPC_SEND(>immnd_mbx, (NCSCONTEXT)pEvt,
+   NCS_IPC_PRIORITY_VERY_HIGH);
+   else if (pEvt->info.immnd.type == IMMND_EVT_A2ND_IMM_INIT)
rc = m_NCS_IPC_SEND(>immnd_mbx, (NCSCONTEXT)pEvt,
NCS_IPC_PRIORITY_HIGH);
else
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix immnd crash in multi partitioned clusters rejoin [#3219] V2

2020-09-16 Thread thuan.tran
Summary: imm: fix immnd crash in multi partitioned clusters rejoin [#3219]
Review request for Ticket(s): 3219
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3219
Base revision: 0e1a6847c264ad5e34ca8413307b118066ae03eb
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 6307668490ad65a1a8c722d72a980584eaeddc4e
Author: thuan.tran 
Date:   Thu, 17 Sep 2020 10:19:54 +0700

imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

- immnd prioritize re-introduce rsp from immd.
- immnd ignore broadcast events from IMMD if re-introduce on-going.



Complete diffstat:
--
 src/imm/immnd/immnd_evt.c | 21 +++--
 src/imm/immnd/immnd_mds.c |  5 -
 2 files changed, 19 insertions(+), 7 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

2020-09-16 Thread thuan.tran
Summary: imm: fix immnd crash in multi partitioned clusters rejoin [#3219]
Review request for Ticket(s): 3219
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3219
Base revision: 0e1a6847c264ad5e34ca8413307b118066ae03eb
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 6cbbf856230c719ae3d4d4f8d845c673639fedfb
Author: thuan.tran 
Date:   Wed, 16 Sep 2020 20:20:59 +0700

imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

- immnd prioritize re-introduce rsp from immd.
- immnd ignore broadcast events from IMMD if re-introduce on-going.



Complete diffstat:
--
 src/imm/immnd/immnd_evt.c | 12 
 src/imm/immnd/immnd_mds.c |  5 -
 2 files changed, 16 insertions(+), 1 deletion(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

2020-09-16 Thread thuan.tran
- immnd prioritize re-introduce rsp from immd.
- immnd ignore broadcast events from IMMD if re-introduce on-going.
---
 src/imm/immnd/immnd_evt.c | 12 
 src/imm/immnd/immnd_mds.c |  5 -
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index afc2106a0..087d512af 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -625,6 +625,18 @@ void immnd_process_evt(void)
return;
}
 
+   if ((cb->mIntroduced == 2) &&
+   ((evt->info.immnd.type == IMMND_EVT_D2ND_SYNC_START) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_SYNC_ABORT) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_PBE_PRTO_PURGE_MUTATIONS) 
||
+(evt->info.immnd.type == IMMND_EVT_D2ND_DUMP_OK) ||
+(evt->info.immnd.type == IMMND_EVT_D2ND_LOADING_OK))) {
+   LOG_WA("DISCARD message from IMMD %x as re-intro on-going",
+   evt->sinfo.node_id);
+   immnd_evt_destroy(evt, true, __LINE__);
+   return;
+   }
+
if ((evt->info.immnd.type != IMMND_EVT_D2ND_GLOB_FEVS_REQ) &&
(evt->info.immnd.type != IMMND_EVT_D2ND_GLOB_FEVS_REQ_2))
immsv_msg_trace_rec(evt->sinfo.dest, evt);
diff --git a/src/imm/immnd/immnd_mds.c b/src/imm/immnd/immnd_mds.c
index 02cb4b552..d9cccd5d9 100644
--- a/src/imm/immnd/immnd_mds.c
+++ b/src/imm/immnd/immnd_mds.c
@@ -552,7 +552,10 @@ static uint32_t immnd_mds_rcv(IMMND_CB *cb, 
MDS_CALLBACK_RECEIVE_INFO *rcv_info)
}
 
/* Put it in IMMND's Event Queue */
-   if (pEvt->info.immnd.type == IMMND_EVT_A2ND_IMM_INIT)
+   if (pEvt->info.immnd.type == IMMND_EVT_D2ND_INTRO_RSP)
+   rc = m_NCS_IPC_SEND(>immnd_mbx, (NCSCONTEXT)pEvt,
+   NCS_IPC_PRIORITY_VERY_HIGH);
+   else if (pEvt->info.immnd.type == IMMND_EVT_A2ND_IMM_INIT)
rc = m_NCS_IPC_SEND(>immnd_mbx, (NCSCONTEXT)pEvt,
NCS_IPC_PRIORITY_HIGH);
else
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: fix amfd crash in multi partitioned clusters rejoin [#3218]

2020-09-16 Thread thuan.tran
When amfd read headless cached RT attr, it should delete SU in
osafAmfSGSuOperationList attr of SG if there is no any assignment
in progress can be found.
---
 src/amf/amfd/sg.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/amf/amfd/sg.cc b/src/amf/amfd/sg.cc
index 47ffb9866..e2c2528c8 100644
--- a/src/amf/amfd/sg.cc
+++ b/src/amf/amfd/sg.cc
@@ -2243,6 +2243,16 @@ void avd_sg_read_headless_cached_rta(AVD_CL_CB *cb) {
   if (op_su) {
 if (op_su->sg_of_su->any_assignment_in_progress()) {
   avd_sg_su_oper_list_add(avd_cb, op_su, false, false);
+} else {
+  TRACE("No any assignment in progress, clean %s",
+op_su->name.c_str());
+  const SaNameTWrapper su_name(op_su->name);
+  avd_saImmOiRtObjectUpdate_sync(
+  sg->name,
+  const_cast("osafAmfSGSuOperationList"),
+  SA_IMM_ATTR_SANAMET,
+  (void *)static_cast(su_name),
+  SA_IMM_ATTR_VALUES_DELETE);
 }
   }
 }
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: fix amfd crash in multi partitioned clusters rejoin [#3218]

2020-09-16 Thread thuan.tran
Summary: amf: fix amfd crash in multi partitioned clusters rejoin [#3218]
Review request for Ticket(s): 3218
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3218
Base revision: 0e1a6847c264ad5e34ca8413307b118066ae03eb
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 7e13015b6c7645cbfc48219ac3dd37cb09d74508
Author: thuan.tran 
Date:   Wed, 16 Sep 2020 20:35:21 +0700

amf: fix amfd crash in multi partitioned clusters rejoin [#3218]

When amfd read headless cached RT attr, it should delete SU in
osafAmfSGSuOperationList attr of SG if there is no any assignment
in progress can be found.



Complete diffstat:
--
 src/amf/amfd/sg.cc | 10 ++
 1 file changed, 10 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mbc: fix agent crash if mds sendto() error [#3217]

2020-09-09 Thread thuan.tran
- Fix #3208 to solve MBC memleak will cause agent crash if
MDS sendto() error return.
- Update a part of fix #3208 to check MDS encode callback
done then not need to free memory as MDS already freed.
---
 src/mbc/mbcsv_mds.c  | 2 ++
 src/mbc/mbcsv_util.c | 9 ++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mbc/mbcsv_mds.c b/src/mbc/mbcsv_mds.c
index afaf1fd1b..2e5121cfc 100644
--- a/src/mbc/mbcsv_mds.c
+++ b/src/mbc/mbcsv_mds.c
@@ -37,6 +37,7 @@
 
 #include "mbcsv.h"
 
+extern bool mds_enc_cb_done;
 MDS_CLIENT_MSG_FORMAT_VER
 MBCSV_wrt_PEER_msg_fmt_array[MBCSV_WRT_PEER_SUBPART_VER_RANGE] = {
 1 /* msg format version for subpart version */
@@ -706,6 +707,7 @@ uint32_t mbcsv_mds_enc(MDS_CLIENT_HDL yr_svc_hdl, 
NCSCONTEXT msg,
NCS_MBCSV_MSG_SYNC_SEND_RSP)
ncs_enc_append_usrbuf(
uba, mm->info.peer_msg.info.client_msg.uba.start);
+   mds_enc_cb_done = true;
 
break;
}
diff --git a/src/mbc/mbcsv_util.c b/src/mbc/mbcsv_util.c
index 9ce79243f..8373b6155 100644
--- a/src/mbc/mbcsv_util.c
+++ b/src/mbc/mbcsv_util.c
@@ -40,6 +40,7 @@
 #include "mbcsv.h"
 #include "base/ncssysf_mem.h"
 
+bool mds_enc_cb_done;
 /**\
 * PROCEDURE: mbcsv_rmv_reg_inst
 *
@@ -492,6 +493,7 @@ uint32_t 
mbcsv_send_ckpt_data_to_all_peers(NCS_MBCSV_SEND_CKPT *msg_to_send,
*uba;
evt_msg.info.peer_msg.info.client_msg.uba
.start = dup_ub;
+   mds_enc_cb_done = false;
 
switch (msg_to_send->i_send_type) {
case NCS_MBCSV_SND_SYNC: {
@@ -513,8 +515,7 @@ uint32_t 
mbcsv_send_ckpt_data_to_all_peers(NCS_MBCSV_SEND_CKPT *msg_to_send,
}
tmp_ptr->ckpt_msg_sent = true;
}
-   if ((rc != NCSCC_RC_SUCCESS) &&
-   (rc != NCSCC_RC_REQ_TIMOUT)) {
+   if (rc != NCSCC_RC_SUCCESS && !mds_enc_cb_done) {
m_MMGR_FREE_BUFR_LIST(dup_ub);
}
tmp_ptr = tmp_ptr->next;
@@ -741,10 +742,12 @@ uint32_t mbcsv_send_notify_msg(uint32_t msg_dest, 
CKPT_INST *ckpt_inst,
.uba = *uba;
evt_msg.info.peer_msg.info.client_msg
.uba.start = dup_ub;
+   mds_enc_cb_done = false;
 
if (m_NCS_MBCSV_MDS_ASYNC_SEND(
_msg, ckpt_inst,
-   peer->peer_anchor) != 
NCSCC_RC_SUCCESS)
+   peer->peer_anchor) != 
NCSCC_RC_SUCCESS &&
+   !mds_enc_cb_done)
m_MMGR_FREE_BUFR_LIST(dup_ub);
}
}
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mbc: fix agent crash if mds sendto() error [#3217]

2020-09-09 Thread thuan.tran
Summary: mbc: fix agent crash if mds sendto() error [#3217]
Review request for Ticket(s): 3217
Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE ***
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3217
Base revision: cbdf697412a0c24895ffe1ad5a57f832022f96cb
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision c0d22d2f2ea1fffc26ee4261e5933b0039424936
Author: thuan.tran 
Date:   Wed, 9 Sep 2020 13:19:47 +0700

mbc: fix agent crash if mds sendto() error [#3217]

- Fix #3208 to solve MBC memleak will cause agent crash if
MDS sendto() error return.
- Update a part of fix #3208 to check MDS encode callback
done then not need to free memory as MDS already freed.



Complete diffstat:
--
 src/mbc/mbcsv_mds.c  | 2 ++
 src/mbc/mbcsv_util.c | 9 ++---
 2 files changed, 8 insertions(+), 3 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 2/2] mds: improve mdstest suite 27 [#3216]

2020-08-27 Thread thuan.tran
- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.
---
 src/mds/apitest/mdstipc_api.c | 36 ++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 3dd1a3dc0..6ce3b7103 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13340,6 +13340,10 @@ void tet_sender2(MDS_SVC_ID svc_id, uint32_t msg_num, 
uint32_t msg_size,
}
}
free(mesg);
+   while (1) {
+   // Receiver will kill sender
+   sleep(1);
+   }
 }
 
 void tet_sender(MDS_SVC_ID svc_id, uint32_t msg_num, uint32_t msg_size,
@@ -13413,6 +13417,8 @@ void tet_sender(MDS_SVC_ID svc_id, uint32_t msg_num, 
uint32_t msg_size,
" successfully\n", i);
}
}
+   if (msg_num > 65535 && msg_size > 1)
+   usleep(1000); // Slow down to avoid reaped by OOM killer
}
free(mesg);
while (1) {
@@ -13426,15 +13432,17 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
uint32_t msg_size, int svc_num,
MDS_SVC_ID fr_svcids[])
 {
+   int rc = 1;
+
if (msg_size > TET_MSG_SIZE_MIN) {
printf("\nReceiver: msg_size > TET_MSG_SIZE_MIN\n");
-   return 1;
+   return rc;
}
printf("\nStarted Receiver (pid:%d) svc_id=%d\n",
(int)getpid(), svc_id);
if (adest_get_handle() != NCSCC_RC_SUCCESS) {
printf("\nReceiver FAIL to get adest handle\n");
-   return 1;
+   return rc;
}
 
sleep(1); //Let sender subscribe before receiver install
@@ -13449,7 +13457,7 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
NCSMDS_SCOPE_INTRANODE,
svc_num, fr_svcids) != NCSCC_RC_SUCCESS) {
printf("\nReceiver FAIL to subscribe sender\n");
-   exit(1);
+   return rc;
}
 
struct pollfd sel;
@@ -13459,7 +13467,7 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
sel.fd = m_GET_FD_FROM_SEL_OBJ(gl_tet_adest.svc[0].sel_obj);
sel.events = POLLIN;
while (1) {
-   int ret = osaf_poll(, 1, 1);
+   int ret = osaf_poll(, 1, 1000);
if (ret > 0) {
gl_rcvdmsginfo.msg = NULL;
if (mds_service_retrieve(gl_tet_adest.mds_pwe1_hdl,
@@ -13479,31 +13487,25 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
" from %x\n", expected_buff,
msg->recvd_data,
gl_rcvdmsginfo.fr_dest);
-   free(expected_buff);
free(msg);
-   reset_counters();
-   return 1;
+   break;
}
free(msg);
+   } else if (gl_event_data.event == NCSMDS_DOWN) {
+   break;
}
-   } else {
+   } else if (verify_counters(msg_num)) {
+   printf("\nReceiver: get enough %d messages\n", msg_num);
+   rc = 0;
break;
}
}
 
-   printf("\nReceiver verify number of received messages\n");
-   if (!verify_counters(msg_num)) {
-   printf("\nReceiver: Not get enough %d messages\n", msg_num);
-   free(expected_buff);
-   reset_counters();
-   return 1;
-   }
-
printf("\nEnd Receiver (pid:%d) svc_id=%d\n",
(int)getpid(), svc_id);
free(expected_buff);
reset_counters();
-   return 0;
+   return rc;
 }
 
 void tet_overload_tp_1(void)
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/2] mds: fix receiving old msg under flow control enabled [#3216]

2020-08-27 Thread thuan.tran
- Revert apart of #3151 solution, not decide PortId reset base on
fseq=1 but reset rcvwnd when getting Intro msg from known PortId.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get any
further message to send ChunkAck.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 23 +--
 src/mds/mds_tipc_fctrl_portid.cc | 38 +++-
 2 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 93bfce51c..348605c67 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -351,26 +351,17 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct 
tipc_portid id,
   uint16_t* next_seq) {
   if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
 
-  uint32_t rc = NCSCC_RC_SUCCESS;
-
   portid_map_mutex.lock();
 
   TipcPortId *portid = portid_lookup(id);
-  if (portid == nullptr) {
-m_MDS_LOG_ERR("FCTRL: [me] --> [node:%x, ref:%u], "
-"[line:%u], Error[PortId not found]",
-id.node, id.ref, __LINE__);
-rc = NCSCC_RC_FAILURE;
-  } else {
-if (portid->state_ != TipcPortId::State::kDisabled) {
+  if (portid && portid->state_ != TipcPortId::State::kDisabled) {
   // assign the sequence number of the outgoing message
   *next_seq = portid->GetCurrentSeq();
-}
   }
 
   portid_map_mutex.unlock();
 
-  return rc;
+  return NCSCC_RC_SUCCESS;
 }
 
 uint32_t mds_tipc_fctrl_trysend(struct tipc_portid id, const uint8_t *buffer,
@@ -564,12 +555,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, 
uint16_t len,
   // no need to decode intro message
   // the decoding intro message type is done in header decoding
   // send to the event thread
-  pevt = new Event(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0);
-  if (m_NCS_IPC_SEND(_events, pevt,
-  NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
-strerror(errno));
-  }
+  portid_map_mutex.lock();
+  Event evt(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0);
+  process_flow_event(evt);
+  portid_map_mutex.unlock();
 } else {
   m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
   "[msg_type:%u], Error[not supported message type]",
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 41fce3df8..f569e1f99 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -373,10 +373,10 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 if (rcvwnd_.rcv_ + 1 < Seq16(fseq)) {
   if (rcvwnd_.rcv_ == 0 && rcvwnd_.acked_ == 0) {
 // peer does not realize that this portid reset
-m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
+m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
 "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
-"Warning[portid reset]",
+"[portid reset]",
 id_.node, id_.ref,
 mseq, mfrag, fseq,
 rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
@@ -397,19 +397,6 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 // send nack
 SendNack((rcvwnd_.rcv_ + 1).v(), svc_id);
   }
-} else if (fseq == 1) {
-  // sender realize me as portid reset
-  m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
-  "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
-  "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
-  "Warning[portid reset on sender]",
-  id_.node, id_.ref,
-  mseq, mfrag, fseq,
-  rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_);
-
-  SendChunkAck(fseq, svc_id, 1);
-  rcvwnd_.rcv_ = Seq16(fseq);
-  rcvwnd_.acked_ = rcvwnd_.rcv_;
 } else if (Seq16(fseq) <= rcvwnd_.rcv_) {
   rc = NCSCC_RC_FAILURE;
   // unexpected retransmission
@@ -509,6 +496,17 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
 fseq);
 return;
   }
+
+  if (Seq16(fseq) <= sndwnd_.acked_) {
+m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
+"RcvNack[fseq:%u], "
+"sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
+"Warning[Invalid Nack]",
+id_.node, id_.ref, fseq,
+sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
+return;
+  }
+
   if (state_ == State::kRcvBuffOverflow) {
 sndqueue_.MarkUnsentFrom(Seq16(fseq));
 if (Seq16(fseq) - sndwnd_.acked_ > 1) {
@@ -606,6 +604,16 @@ void TipcPortId::ReceiveIntro() {
   if (state_ == State::kStartup || state_ == State::kTxProb) {
 ChangeState(State::kEnabled);
   }
+  if 

[devel] [PATCH 0/2] Review Request for mds: fix receiving old msg under flow control enabled [#3216] V3

2020-08-27 Thread thuan.tran
Summary: mds: fix receiving old msg under flow control enabled [#3216]
Review request for Ticket(s): 3216
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3216
Base revision: 7a5a4888e753a26765db43ec321235c486072304
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision dfb068eb32ab533a71922a3981ffb7e8fe97a983
Author: thuan.tran 
Date:   Thu, 27 Aug 2020 13:56:16 +0700

mds: improve mdstest suite 27 [#3216]

- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.



revision 2563b0c04caf88ddd06256e59252b989c3a6157f
Author: thuan.tran 
Date:   Thu, 27 Aug 2020 13:12:31 +0700

mds: fix receiving old msg under flow control enabled [#3216]

- Revert apart of #3151 solution, not decide PortId reset base on
fseq=1 but reset rcvwnd when getting Intro msg from known PortId.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get any
further message to send ChunkAck.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.



Complete diffstat:
--
 src/mds/apitest/mdstipc_api.c| 36 +++-
 src/mds/mds_tipc_fctrl_intf.cc   | 23 ++-
 src/mds/mds_tipc_fctrl_portid.cc | 38 +++---
 3 files changed, 48 insertions(+), 49 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch th

[devel] [PATCH 0/1] Review Request for mds: fix receiving old msg under flow control enabled [#3216] V2

2020-08-26 Thread thuan.tran
Summary: mds: fix receiving old msg under flow control enabled [#3216]
Review request for Ticket(s): 3216
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3216
Base revision: 7a5a4888e753a26765db43ec321235c486072304
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 8ed7ac9a8fa95539d0e7fc785888bca4cc40ed2f
Author: thuan.tran 
Date:   Wed, 26 Aug 2020 17:23:32 +0700

mds: fix receiving old msg under flow control enabled [#3216]

- Revert apart of #3151 solution, not decide PortId reset base on
fseq=1 but reset rcvwnd when getting Intro msg from known PortId.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get any
further message to send ChunkAck.
- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.



Complete diffstat:
--
 src/mds/apitest/mdstipc_api.c| 15 +--
 src/mds/mds_tipc_fctrl_intf.cc   | 23 ++-
 src/mds/mds_tipc_fctrl_portid.cc | 38 +++---
 3 files changed, 34 insertions(+), 42 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/list

[devel] [PATCH 1/1] mds: fix receiving old msg under flow control enabled [#3216]

2020-08-26 Thread thuan.tran
- Revert apart of #3151 solution, not decide PortId reset base on
fseq=1 but reset rcvwnd when getting Intro msg from known PortId.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get any
further message to send ChunkAck.
- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.
---
 src/mds/apitest/mdstipc_api.c| 15 +
 src/mds/mds_tipc_fctrl_intf.cc   | 23 +--
 src/mds/mds_tipc_fctrl_portid.cc | 38 +++-
 3 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 3dd1a3dc0..641753d7a 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13413,6 +13413,8 @@ void tet_sender(MDS_SVC_ID svc_id, uint32_t msg_num, 
uint32_t msg_size,
" successfully\n", i);
}
}
+   if (msg_num > 65535 && msg_size > 1)
+   usleep(1000); // Slow down to avoid reaped by OOM killer
}
free(mesg);
while (1) {
@@ -13459,7 +13461,7 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
sel.fd = m_GET_FD_FROM_SEL_OBJ(gl_tet_adest.svc[0].sel_obj);
sel.events = POLLIN;
while (1) {
-   int ret = osaf_poll(, 1, 1);
+   int ret = osaf_poll(, 1, 1000);
if (ret > 0) {
gl_rcvdmsginfo.msg = NULL;
if (mds_service_retrieve(gl_tet_adest.mds_pwe1_hdl,
@@ -13486,19 +13488,12 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
}
free(msg);
}
-   } else {
+   } else if (verify_counters(msg_num)) {
+   printf("\nReceiver: get enough %d messages\n", msg_num);
break;
}
}
 
-   printf("\nReceiver verify number of received messages\n");
-   if (!verify_counters(msg_num)) {
-   printf("\nReceiver: Not get enough %d messages\n", msg_num);
-   free(expected_buff);
-   reset_counters();
-   return 1;
-   }
-
printf("\nEnd Receiver (pid:%d) svc_id=%d\n",
(int)getpid(), svc_id);
free(expected_buff);
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 93bfce51c..348605c67 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -351,26 +351,17 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct 
tipc_portid id,
   uint16_t* next_seq) {
   if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
 
-  uint32_t rc = NCSCC_RC_SUCCESS;
-
   portid_map_mutex.lock();
 
   TipcPortId *portid = portid_lookup(id);
-  if (portid == nullptr) {
-m_MDS_LOG_ERR("FCTRL: [me] --> [node:%x, ref:%u], "
-"[line:%u], Error[PortId not found]",
-id.node, id.ref, __LINE__);
-rc = NCSCC_RC_FAILURE;
-  } else {
-if (portid->state_ != TipcPortId::State::kDisabled) {
+  if (portid && portid->state_ != TipcPortId::State::kDisabled) {
   // assign the sequence number of the outgoing message
   *next_seq = portid->GetCurrentSeq();
-}
   }
 
   portid_map_mutex.unlock();
 
-  return rc;
+  return NCSCC_RC_SUCCESS;
 }
 
 uint32_t mds_tipc_fctrl_trysend(struct tipc_portid id, const uint8_t *buffer,
@@ -564,12 +555,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, 
uint16_t len,
   // no need to decode intro message
   // the decoding intro message type is done in header decoding
   // send to the event thread
-  pevt = new Event(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0);
-  if (m_NCS_IPC_SEND(_events, pevt,
-  NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
-m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
-strerror(errno));
-  }
+  portid_map_mutex.lock();
+  Event evt(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0);
+  process_flow_event(evt);
+  portid_map_mutex.unlock();
 } else {
   m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
   "[msg_type:%u], Error[not supported message type]",
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 41fce3df8..f569e1f99 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -373,10 +373,10 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 if (rcvwnd_.rcv_ + 

[devel] [PATCH 0/1] Review Request for mds: fix receiving old msg under flow control enabled [#3216]

2020-08-24 Thread thuan.tran
Summary: mds: fix receiving old msg under flow control enabled [#3216]
Review request for Ticket(s): 3216
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3216
Base revision: 7a5a4888e753a26765db43ec321235c486072304
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision b71fd8d00fe3c6306461f3c0887e19456c76a6b5
Author: thuan.tran 
Date:   Mon, 24 Aug 2020 12:50:25 +0700

mds: fix receiving old msg under flow control enabled [#3216]

- Store and check last received fseq to not mistakenly decide
PortId reset scenario.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get
any further message to send ChunkAck.
- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.



Complete diffstat:
--
 src/mds/apitest/mdstipc_api.c| 15 +--
 src/mds/mds_tipc_fctrl_intf.cc   | 13 ++---
 src/mds/mds_tipc_fctrl_portid.cc | 17 +++--
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 4 files changed, 23 insertions(+), 23 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mds: fix receiving old msg under flow control enabled [#3216]

2020-08-24 Thread thuan.tran
- Store and check last received fseq to not mistakenly decide
PortId reset scenario.
- Check to skip invalid Nack to avoid sender mistake move to
overflow and queue all messages later but receiver don't get
any further message to send ChunkAck.
- Update tet_receiver() to poll without timeout as sender may
take long time for sendto() return due to run out of memory.
- Update tet_sender() to slow down sending if amount of message
is big and message size is big to avoid kernel kill it as memory
usage too much.
- Not return error if PortId not found in checking send queue
capable to avoid agent crash after fix #3208 if agent enable mds
flow control.
---
 src/mds/apitest/mdstipc_api.c| 15 +--
 src/mds/mds_tipc_fctrl_intf.cc   | 13 ++---
 src/mds/mds_tipc_fctrl_portid.cc | 17 +++--
 src/mds/mds_tipc_fctrl_portid.h  |  1 +
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 3dd1a3dc0..ca5674b1b 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13413,6 +13413,8 @@ void tet_sender(MDS_SVC_ID svc_id, uint32_t msg_num, 
uint32_t msg_size,
" successfully\n", i);
}
}
+   if (msg_num > 65535 && msg_size > 1)
+   usleep(100); // Slow down to avoid reaped by OOM killer
}
free(mesg);
while (1) {
@@ -13459,7 +13461,7 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
sel.fd = m_GET_FD_FROM_SEL_OBJ(gl_tet_adest.svc[0].sel_obj);
sel.events = POLLIN;
while (1) {
-   int ret = osaf_poll(, 1, 1);
+   int ret = osaf_poll(, 1, 1000);
if (ret > 0) {
gl_rcvdmsginfo.msg = NULL;
if (mds_service_retrieve(gl_tet_adest.mds_pwe1_hdl,
@@ -13486,19 +13488,12 @@ int tet_receiver(MDS_SVC_ID svc_id, uint32_t msg_num,
}
free(msg);
}
-   } else {
+   } else if (verify_counters(msg_num)) {
+   printf("\nReceiver: get enough %d messages\n", msg_num);
break;
}
}
 
-   printf("\nReceiver verify number of received messages\n");
-   if (!verify_counters(msg_num)) {
-   printf("\nReceiver: Not get enough %d messages\n", msg_num);
-   free(expected_buff);
-   reset_counters();
-   return 1;
-   }
-
printf("\nEnd Receiver (pid:%d) svc_id=%d\n",
(int)getpid(), svc_id);
free(expected_buff);
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 93bfce51c..34069a262 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -351,26 +351,17 @@ uint32_t mds_tipc_fctrl_sndqueue_capable(struct 
tipc_portid id,
   uint16_t* next_seq) {
   if (is_fctrl_enabled == false) return NCSCC_RC_SUCCESS;
 
-  uint32_t rc = NCSCC_RC_SUCCESS;
-
   portid_map_mutex.lock();
 
   TipcPortId *portid = portid_lookup(id);
-  if (portid == nullptr) {
-m_MDS_LOG_ERR("FCTRL: [me] --> [node:%x, ref:%u], "
-"[line:%u], Error[PortId not found]",
-id.node, id.ref, __LINE__);
-rc = NCSCC_RC_FAILURE;
-  } else {
-if (portid->state_ != TipcPortId::State::kDisabled) {
+  if (portid && portid->state_ != TipcPortId::State::kDisabled) {
   // assign the sequence number of the outgoing message
   *next_seq = portid->GetCurrentSeq();
-}
   }
 
   portid_map_mutex.unlock();
 
-  return rc;
+  return NCSCC_RC_SUCCESS;
 }
 
 uint32_t mds_tipc_fctrl_trysend(struct tipc_portid id, const uint8_t *buffer,
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 41fce3df8..06c8d18d1 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -371,7 +371,8 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 
 // check for transmission error
 if (rcvwnd_.rcv_ + 1 < Seq16(fseq)) {
-  if (rcvwnd_.rcv_ == 0 && rcvwnd_.acked_ == 0) {
+  if (rcvwnd_.rcv_ == 0 && rcvwnd_.acked_ == 0 &&
+  last_rcv_fseq_ != Seq16(0)) {
 // peer does not realize that this portid reset
 m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
@@ -397,7 +398,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
 // send nack
 SendNack((rcvwnd_.rcv_ + 1).v(), svc_id);
   }
-} else if (fseq == 1) {
+} else if (fseq == 1 && last_rcv_fseq_ != Seq16(0)) {
   // sender realize me as portid reset
   m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
   "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
@@ -422,6 +423,7 @@ uint32_t 

[devel] [PATCH 0/1] Review Request for imm: fix memleak detected by valgrind [#3213] V2

2020-08-18 Thread thuan.tran
Summary: imm: fix memleak detected by valgrind [#3213]
Review request for Ticket(s): 3213
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3213
Base revision: 1651782d12326286208663cd8abea4dc2b0f7a3b
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision a82d5da2e80f607a30938710229c01346234c9bc
Author: thuan.tran 
Date:   Tue, 18 Aug 2020 10:53:38 +0700

imm: fix memleak detected by valgrind [#3213]



Complete diffstat:
--
 src/imm/immd/immd_evt.c   |  3 +++
 src/imm/immnd/immnd_evt.c | 16 
 2 files changed, 19 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix memleak detected by valgrind [#3213]

2020-08-18 Thread thuan.tran
---
 src/imm/immd/immd_evt.c   |  3 +++
 src/imm/immnd/immnd_evt.c | 16 
 2 files changed, 19 insertions(+)

diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index de247c9fa..8d789249d 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -256,6 +256,9 @@ void immd_process_evt(void)
rc, evt->info.immd.type);
}
 
+   if (evt->info.immd.type == IMMD_EVT_ND2D_ADMINIT_REQ)
+   osaf_extended_name_free(
+   >info.immd.info.admown_init.i.adminOwnerName);
/* Free the Event */
free(evt);
TRACE_LEAVE();
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index ff2538d15..afc2106a0 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -566,6 +566,22 @@ uint32_t immnd_evt_destroy(IMMSV_EVT *evt, bool onheap, 
uint32_t line)
free(evt->info.immnd.info.ccbUpcallRsp.errorString.buf);
evt->info.immnd.info.ccbUpcallRsp.errorString.buf = NULL;
evt->info.immnd.info.ccbUpcallRsp.errorString.size = 0;
+   if (evt->info.immnd.type ==
+   IMMND_EVT_A2ND_CCB_OBJ_DELETE_RSP_2)
+   osaf_extended_name_free(
+   >info.immnd.info.ccbUpcallRsp.name);
+   } else if ((evt->info.immnd.type ==
+   IMMND_EVT_A2ND_CCB_OBJ_DELETE_RSP) ||
+   (evt->info.immnd.type ==
+   IMMND_EVT_A2ND_OI_CCB_AUG_INIT)) {
+   osaf_extended_name_free(
+   >info.immnd.info.ccbUpcallRsp.name);
+   } else if (evt->info.immnd.type == IMMND_EVT_A2ND_IMM_ADMINIT) {
+   osaf_extended_name_free(
+   >info.immnd.info.adminitReq.i.adminOwnerName);
+   } else if (evt->info.immnd.type == IMMND_EVT_D2ND_ADMINIT) {
+   osaf_extended_name_free(
+   >info.immnd.info.adminitGlobal.i.adminOwnerName);
} else if (evt->info.immnd.type == IMMND_EVT_D2ND_IMPLDELETE) {
for(uint32_t i=0; iinfo.immnd.info.impl_delete.size; ++i) 
{

free(evt->info.immnd.info.impl_delete.implNameList[i].buf);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mbc: fix agent crash inside ncs_mbcsv_null_func() [#3214]

2020-08-14 Thread thuan.tran
---
 src/mbc/mbcsv_peer.c | 8 +++-
 src/mbc/mbcsv_util.c | 5 ++---
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/mbc/mbcsv_peer.c b/src/mbc/mbcsv_peer.c
index 1a9eeb125..e81352105 100644
--- a/src/mbc/mbcsv_peer.c
+++ b/src/mbc/mbcsv_peer.c
@@ -457,7 +457,6 @@ FSM. * Then depending on its current 
role it will start FSM.
 void mbcsv_clear_multiple_active_state(CKPT_INST *ckpt)
 {
PEER_INST *peer;
-   MBCSV_EVT rcvd_evt;
TRACE_ENTER();
 
/*
@@ -470,8 +469,7 @@ void mbcsv_clear_multiple_active_state(CKPT_INST *ckpt)
peer = ckpt->peer_list;
TRACE("multiple ACTIVE peers");
 
-   m_NCS_MBCSV_FSM_DISPATCH(peer, NCSMBCSV_EVENT_MULTIPLE_ACTIVE,
-_evt);
+   m_NCS_MBCSV_FSM_DISPATCH(peer, NCSMBCSV_EVENT_MULTIPLE_ACTIVE, 
NULL);
 
TRACE_LEAVE();
return;
@@ -491,12 +489,12 @@ void mbcsv_clear_multiple_active_state(CKPT_INST *ckpt)
m_NCS_MBCSV_FSM_DISPATCH(
peer,
NCSMBCSV_EVENT_STATE_TO_KEEP_STBY_SYNC,
-   _evt);
+   NULL);
else
m_NCS_MBCSV_FSM_DISPATCH(
peer,
NCSMBCSV_EVENT_STATE_TO_WAIT_FOR_CW_SYNC,
-   _evt);
+   NULL);
}
 
peer = peer->next;
diff --git a/src/mbc/mbcsv_util.c b/src/mbc/mbcsv_util.c
index dafa268ba..9ce79243f 100644
--- a/src/mbc/mbcsv_util.c
+++ b/src/mbc/mbcsv_util.c
@@ -409,8 +409,7 @@ uint32_t 
mbcsv_send_ckpt_data_to_all_peers(NCS_MBCSV_SEND_CKPT *msg_to_send,
continue;
}
TRACE("dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE");
-   m_NCS_MBCSV_FSM_DISPATCH(peer_ptr, NCSMBCSV_SEND_ASYNC_UPDATE,
-_msg);
+   m_NCS_MBCSV_FSM_DISPATCH(peer_ptr, NCSMBCSV_SEND_ASYNC_UPDATE, 
NULL);
 
if (false == peer_ptr->okay_to_async_updt) {
peer_ptr->ckpt_msg_sent = true;
@@ -471,7 +470,7 @@ uint32_t 
mbcsv_send_ckpt_data_to_all_peers(NCS_MBCSV_SEND_CKPT *msg_to_send,
while (NULL != tmp_ptr) {
TRACE("dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE");
m_NCS_MBCSV_FSM_DISPATCH(
-   tmp_ptr, NCSMBCSV_SEND_ASYNC_UPDATE, _msg);
+   tmp_ptr, NCSMBCSV_SEND_ASYNC_UPDATE, NULL);
 
if (false == tmp_ptr->okay_to_async_updt) {
tmp_ptr->ckpt_msg_sent = true;
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mbc: fix agent crash inside ncs_mbcsv_null_func() [#3214]

2020-08-14 Thread thuan.tran
Summary: mbc: fix agent crash inside ncs_mbcsv_null_func() [#3214]
Review request for Ticket(s): 3214
Peer Reviewer(s): Thanh, Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3214
Base revision: 677a139adfc37ec3b9c4692d5e40b481c81225ce
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 5ae8e6c8fb7189847f172cfda83002a24a2dd8d8
Author: thuan.tran 
Date:   Fri, 14 Aug 2020 13:25:30 +0700

mbc: fix agent crash inside ncs_mbcsv_null_func() [#3214]



Complete diffstat:
--
 src/mbc/mbcsv_peer.c | 8 +++-
 src/mbc/mbcsv_util.c | 5 ++---
 2 files changed, 5 insertions(+), 8 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: fix memleak detected by valgrind [#3213]

2020-08-13 Thread thuan.tran
---
 src/imm/immnd/immnd_evt.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index ff2538d15..f96f2590f 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -572,6 +572,11 @@ uint32_t immnd_evt_destroy(IMMSV_EVT *evt, bool onheap, 
uint32_t line)
}
free(evt->info.immnd.info.impl_delete.implNameList);
}
+   if ((evt->info.immnd.type == IMMND_EVT_A2ND_CCB_OBJ_DELETE_RSP) ||
+   (evt->info.immnd.type == IMMND_EVT_A2ND_CCB_OBJ_DELETE_RSP_2) ||
+   (evt->info.immnd.type == IMMND_EVT_A2ND_OI_CCB_AUG_INIT)) {
+   
osaf_extended_name_free(>info.immnd.info.ccbUpcallRsp.name);
+   }
 
if (onheap) {
free(evt);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix memleak detected by valgrind [#3213]

2020-08-13 Thread thuan.tran
Summary: imm: fix memleak detected by valgrind [#3213]
Review request for Ticket(s): 3213
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3213
Base revision: db5e0c24ff713126d2912b6c3ec59cfbcfbda491
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision e640ba27c89c2560988c1c94fa5bc31d144ae4ae
Author: thuan.tran 
Date:   Thu, 13 Aug 2020 16:19:43 +0700

imm: fix memleak detected by valgrind [#3213]



Complete diffstat:
--
 src/imm/immnd/immnd_evt.c | 5 +
 1 file changed, 5 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for osaf: move common functions into immutil [#3211] V2

2020-08-11 Thread thuan.tran
Summary: osaf: move common functions into immutil [#3211]
Review request for Ticket(s): 3211
Peer Reviewer(s): Minh, Thang, Thanh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3211
Base revision: 0a47ca7c0e99c68f18115b58fa39ffb3e59aec48
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 27acc3a789c9a0356ce5a5086c11776a47b6bdda
Author: thuan.tran 
Date:   Tue, 11 Aug 2020 13:01:21 +0700

osaf: move common functions into immutil [#3211]



Complete diffstat:
--
 src/amf/amfd/imm.cc | 187 +++
 src/amf/amfd/imm.h  |   2 +-
 src/ntf/ntfimcnd/ntfimcn_imm.c  | 181 +-
 src/ntf/ntfimcnd/ntfimcn_imm.h  |  30 +
 src/ntf/ntfimcnd/ntfimcn_notifier.c |   2 +-
 src/osaf/immutil/immutil.c  | 215 ++--
 src/osaf/immutil/immutil.h  |  36 ++
 7 files changed, 263 insertions(+), 390 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] osaf: move common functions into immutil [#3211]

2020-08-11 Thread thuan.tran
---
 src/amf/amfd/imm.cc | 187 ++--
 src/amf/amfd/imm.h  |   2 +-
 src/ntf/ntfimcnd/ntfimcn_imm.c  | 181 +--
 src/ntf/ntfimcnd/ntfimcn_imm.h  |  30 +---
 src/ntf/ntfimcnd/ntfimcn_notifier.c |   2 +-
 src/osaf/immutil/immutil.c  | 215 +++-
 src/osaf/immutil/immutil.h  |  36 +
 7 files changed, 263 insertions(+), 390 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index d917b0d8b..826e90d41 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -178,7 +178,7 @@ AvdJobDequeueResultT ImmObjCreate::exec(AVD_CL_CB *cb) {
 goto done;
   }
   rc = saImmOiRtObjectCreate_2(immOiHandle, className_, parent_name,
-   attrValues_);
+   (const SaImmAttrValuesT_2**)attrValues_);
   cb->avd_imm_status = AVD_IMM_INIT_DONE;
 
   if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_EXIST)) {
@@ -208,31 +208,8 @@ done:
 
 //
 ImmObjCreate::~ImmObjCreate() {
-  unsigned int i, j;
-
-  for (i = 0; attrValues_[i] != nullptr; i++) {
-SaImmAttrValuesT_2 *attrValue = (SaImmAttrValuesT_2 *)attrValues_[i];
-
-if (attrValue->attrValueType == SA_IMM_ATTR_SASTRINGT) {
-  for (j = 0; j < attrValue->attrValuesNumber; j++) {
-char *p = *((char **)attrValue->attrValues[j]);
-delete[] p;
-  }
-} else if (attrValue->attrValueType == SA_IMM_ATTR_SANAMET) {
-  for (j = 0; j < attrValue->attrValuesNumber; j++) {
-SaNameT *name = reinterpret_cast(attrValue->attrValues[i]);
-osaf_extended_name_free(name);
-  }
-}
-delete[] attrValue->attrName;
-delete[] static_cast(
-attrValue->attrValues[0]);  // free blob shared by all values
-delete[] attrValue->attrValues;
-delete attrValue;
-  }
-
+  immutil_freeSaImmAttrValuesT(attrValues_);
   delete[] className_;
-  delete[] attrValues_;
 }
 
 //
@@ -630,110 +607,18 @@ typedef struct avd_ccb_apply_ordered_list {
 
 static AvdCcbApplyOrderedListT *ccb_apply_list;
 
-/* 
- *   FUNCTION PROTOTYPES
- * 
- */
-
-static size_t value_size(SaImmValueTypeT attrValueType) {
-  size_t valueSize = 0;
-
-  switch (attrValueType) {
-case SA_IMM_ATTR_SAINT32T:
-  valueSize = sizeof(SaInt32T);
-  break;
-case SA_IMM_ATTR_SAUINT32T:
-  valueSize = sizeof(SaUint32T);
-  break;
-case SA_IMM_ATTR_SAINT64T:
-  valueSize = sizeof(SaInt64T);
-  break;
-case SA_IMM_ATTR_SAUINT64T:
-  valueSize = sizeof(SaUint64T);
-  break;
-case SA_IMM_ATTR_SATIMET:
-  valueSize = sizeof(SaTimeT);
-  break;
-case SA_IMM_ATTR_SANAMET:
-  valueSize = sizeof(SaNameT);
-  break;
-case SA_IMM_ATTR_SAFLOATT:
-  valueSize = sizeof(SaFloatT);
-  break;
-case SA_IMM_ATTR_SADOUBLET:
-  valueSize = sizeof(SaDoubleT);
-  break;
-case SA_IMM_ATTR_SASTRINGT:
-  valueSize = sizeof(SaStringT);
-  break;
-case SA_IMM_ATTR_SAANYT:
-  osafassert(0);
-  break;
-  }
-
-  return valueSize;
-}
-
-static void copySaImmAttrValuesT(SaImmAttrValuesT_2 *copy,
- const SaImmAttrValuesT_2 *original) {
-  size_t valueSize = 0;
-  unsigned int i, valueCount = original->attrValuesNumber;
-  char *databuffer;
-
-  copy->attrName = StrDup(original->attrName);
-
-  copy->attrValuesNumber = valueCount;
-  copy->attrValueType = original->attrValueType;
-  if (valueCount == 0) return; /* (just in case...) */
-
-  copy->attrValues = new SaImmAttrValueT[valueCount];
-
-  valueSize = value_size(original->attrValueType);
-
-  // alloc blob shared by all values
-  databuffer = new char[valueCount * valueSize];
-
-  for (i = 0; i < valueCount; i++) {
-copy->attrValues[i] = databuffer;
-if (original->attrValueType == SA_IMM_ATTR_SASTRINGT) {
-  char *cporig = *((char **)original->attrValues[i]);
-  char **cpp = (char **)databuffer;
-  *cpp = StrDup(cporig);
-} else if (original->attrValueType == SA_IMM_ATTR_SANAMET) {
-  SaNameT *orig = reinterpret_cast(original->attrValues[i]);
-  SaNameT *dest = reinterpret_cast(databuffer);
-  osaf_extended_name_alloc(osaf_extended_name_borrow(orig), dest);
-} else {
-  memcpy(databuffer, original->attrValues[i], valueSize);
-}
-databuffer += valueSize;
-  }
-}
-
-static const SaImmAttrValuesT_2 *dupSaImmAttrValuesT(
-const SaImmAttrValuesT_2 *original) {
-  SaImmAttrValuesT_2 *copy = new SaImmAttrValuesT_2;
-
-  copySaImmAttrValuesT(copy, original);
-  return copy;
-}
-
-static const SaImmAttrValuesT_2 **dupSaImmAttrValuesT_array(
-const SaImmAttrValuesT_2 **original) {
-  const SaImmAttrValuesT_2 **copy;
-  unsigned int i, alen = 0;
-
-  if (original == nullptr) return nullptr;
-
-  while (original[alen] != nullptr) 

[devel] [PATCH 0/1] Review Request for mbc: fix memleak detected by valgrind [#3208] V2

2020-08-10 Thread thuan.tran
Summary: mbc: fix memleak detected by valgrind [#3208]
Review request for Ticket(s): 3208
Peer Reviewer(s): Minh, Thang, Thanh, Thien
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3208
Base revision: 0a47ca7c0e99c68f18115b58fa39ffb3e59aec48
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 7bdac480b6a3a0ec4e17d04edfcc0fedef5db085
Author: thuan.tran 
Date:   Mon, 10 Aug 2020 15:06:51 +0700

mbc: fix memleak detected by valgrind [#3208]



Complete diffstat:
--
 src/mbc/mbcsv_act.c |  5 +
 src/mbc/mbcsv_evt_msg.h |  1 +
 src/mbc/mbcsv_mds.c | 15 ---
 src/mbc/mbcsv_pr_evts.c |  3 ++-
 src/mbc/mbcsv_queue.c   | 13 +
 src/mbc/mbcsv_util.c| 23 +++
 6 files changed, 44 insertions(+), 16 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mbc: fix memleak detected by valgrind [#3208]

2020-08-10 Thread thuan.tran
---
 src/mbc/mbcsv_act.c |  5 +
 src/mbc/mbcsv_evt_msg.h |  1 +
 src/mbc/mbcsv_mds.c | 15 ---
 src/mbc/mbcsv_pr_evts.c |  3 ++-
 src/mbc/mbcsv_queue.c   | 13 +
 src/mbc/mbcsv_util.c| 23 +++
 6 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/src/mbc/mbcsv_act.c b/src/mbc/mbcsv_act.c
index d58f502d2..ae0e8e539 100644
--- a/src/mbc/mbcsv_act.c
+++ b/src/mbc/mbcsv_act.c
@@ -77,6 +77,11 @@ void ncs_mbcsv_null_func(PEER_INST *peer, MBCSV_EVT *evt)
  peer->my_ckpt_inst->my_role,
  peer->my_ckpt_inst->my_mbcsv_inst->svc_id,
  peer->my_ckpt_inst->pwe_hdl);
+   if ((evt) && (evt->msg_type == MBCSV_EVT_INTERNAL_RCV) &&
+   (evt->info.peer_msg.type == MBCSV_EVT_INTERNAL_CLIENT)) {
+   m_MMGR_FREE_BUFR_LIST(
+   evt->info.peer_msg.info.client_msg.uba.ub);
+   }
 }
 
 /*
diff --git a/src/mbc/mbcsv_evt_msg.h b/src/mbc/mbcsv_evt_msg.h
index 9eef74713..b9addb3a4 100644
--- a/src/mbc/mbcsv_evt_msg.h
+++ b/src/mbc/mbcsv_evt_msg.h
@@ -49,6 +49,7 @@ typedef enum {
   MBCSV_EVT_MDS_SUBSCR,
   MBCSV_EVT_TMR,
   MBCSV_EVT_INTERNAL,
+  MBCSV_EVT_INTERNAL_RCV,
 } MBCSV_EVT_TYPE;
 
 typedef struct mbcsv_evt_tmr_info {
diff --git a/src/mbc/mbcsv_mds.c b/src/mbc/mbcsv_mds.c
index 964e33542..afaf1fd1b 100644
--- a/src/mbc/mbcsv_mds.c
+++ b/src/mbc/mbcsv_mds.c
@@ -193,6 +193,7 @@ void mbcsv_mds_unreg(uint32_t pwe_hdl)
 uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT *msg, CKPT_INST 
*ckpt,
MBCSV_ANCHOR anchor)
 {
+   uint32_t rc;
NCSMDS_INFO mds_info;
TRACE_ENTER2("sending to vdest:%" PRIx64, ckpt->my_vdest);
 
@@ -241,7 +242,7 @@ uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT 
*msg, CKPT_INST *ckpt,
return NCSCC_RC_FAILURE;
}
 
-   if (ncsmds_api(_info) == NCSCC_RC_SUCCESS) {
+   if ((rc = ncsmds_api(_info)) == NCSCC_RC_SUCCESS) {
/* If message is send resp  then free the message received in
 * response  */
if ((MDS_SENDTYPE_REDRSP == send_type) &&
@@ -253,7 +254,7 @@ uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT 
*msg, CKPT_INST *ckpt,
return NCSCC_RC_SUCCESS;
} else {
TRACE_LEAVE2("failure");
-   return NCSCC_RC_FAILURE;
+   return rc;
}
 }
 
@@ -379,7 +380,7 @@ uint32_t mbcsv_mds_rcv(NCSMDS_CALLBACK_INFO *cbinfo)
 * We found out the mailbox to which we can post a message. Now
 * construct and send this message to the mailbox.
 */
-   msg->msg_type = MBCSV_EVT_INTERNAL;
+   msg->msg_type = MBCSV_EVT_INTERNAL_RCV;
 
if (msg->info.peer_msg.type == MBCSV_EVT_INTERNAL_PEER_DISC) {
send_pri = NCS_IPC_PRIORITY_HIGH;
@@ -387,11 +388,19 @@ uint32_t mbcsv_mds_rcv(NCSMDS_CALLBACK_INFO *cbinfo)
send_pri = NCS_IPC_PRIORITY_NORMAL;
 
if (NCSCC_RC_SUCCESS != m_MBCSV_SND_MSG(, msg, send_pri)) {
+   if (msg->info.peer_msg.type == 
MBCSV_EVT_INTERNAL_CLIENT) {
+   m_MMGR_FREE_BUFR_LIST(
+   msg->info.peer_msg.info.client_msg.uba.ub);
+   }
m_MMGR_FREE_MBCSV_EVT(msg);
TRACE_LEAVE2("ipc send failed");
return NCSCC_RC_FAILURE;
}
} else {
+   if (msg->info.peer_msg.type == MBCSV_EVT_INTERNAL_CLIENT) {
+   m_MMGR_FREE_BUFR_LIST(
+   msg->info.peer_msg.info.client_msg.uba.ub);
+   }
m_MMGR_FREE_MBCSV_EVT(msg);
}
 
diff --git a/src/mbc/mbcsv_pr_evts.c b/src/mbc/mbcsv_pr_evts.c
index 0deb5f25b..72dce61b5 100644
--- a/src/mbc/mbcsv_pr_evts.c
+++ b/src/mbc/mbcsv_pr_evts.c
@@ -147,7 +147,8 @@ uint32_t mbcsv_process_events(MBCSV_EVT *rcvd_evt, uint32_t 
mbcsv_hdl)
goto pr_done;
}
} break;
-   case MBCSV_EVT_INTERNAL: {
+   case MBCSV_EVT_INTERNAL:
+   case MBCSV_EVT_INTERNAL_RCV: {
/*
 * Process all the received events.
 */
diff --git a/src/mbc/mbcsv_queue.c b/src/mbc/mbcsv_queue.c
index c0a41fce0..25ae13f5f 100644
--- a/src/mbc/mbcsv_queue.c
+++ b/src/mbc/mbcsv_queue.c
@@ -77,12 +77,17 @@ uint32_t mbcsv_client_queue_init(MBCSV_REG *mbc_reg)
 **/
 bool mbcsv_client_cleanup_mbx(NCSCONTEXT arg, NCSCONTEXT msg)
 {
-   MBCSV_EVT *node = (MBCSV_EVT *)msg;
+   MBCSV_EVT *evt = (MBCSV_EVT *)msg;
TRACE_ENTER();
 
-   /* deallocate the nodes */
-   if (NULL != node) {
-   

[devel] [PATCH 0/1] Review Request for osaf: move common functions into immutil [#3211]

2020-08-09 Thread thuan.tran
Summary: osaf: move common functions into immutil [#3211]
Review request for Ticket(s): 3211
Peer Reviewer(s): Thanh, Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3211
Base revision: 0a47ca7c0e99c68f18115b58fa39ffb3e59aec48
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision e8b98b8a9ce862c3727f58cb096e0215826f5f19
Author: thuan.tran 
Date:   Thu, 6 Aug 2020 00:47:16 +0700

osaf: move common functions into immutil [#3211]



Complete diffstat:
--
 src/amf/amfd/imm.cc | 186 +++
 src/amf/amfd/imm.h  |   2 +-
 src/ntf/ntfimcnd/ntfimcn_imm.c  | 181 +-
 src/ntf/ntfimcnd/ntfimcn_imm.h  |  30 +
 src/ntf/ntfimcnd/ntfimcn_notifier.c |   2 +-
 src/osaf/immutil/immutil.c  | 215 ++--
 src/osaf/immutil/immutil.h  |  36 ++
 7 files changed, 263 insertions(+), 389 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] osaf: move common functions into immutil [#3211]

2020-08-09 Thread thuan.tran
---
 src/amf/amfd/imm.cc | 186 ++--
 src/amf/amfd/imm.h  |   2 +-
 src/ntf/ntfimcnd/ntfimcn_imm.c  | 181 +--
 src/ntf/ntfimcnd/ntfimcn_imm.h  |  30 +---
 src/ntf/ntfimcnd/ntfimcn_notifier.c |   2 +-
 src/osaf/immutil/immutil.c  | 215 +++-
 src/osaf/immutil/immutil.h  |  36 +
 7 files changed, 263 insertions(+), 389 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index d917b0d8b..bb9052a9d 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -178,7 +178,7 @@ AvdJobDequeueResultT ImmObjCreate::exec(AVD_CL_CB *cb) {
 goto done;
   }
   rc = saImmOiRtObjectCreate_2(immOiHandle, className_, parent_name,
-   attrValues_);
+   (const SaImmAttrValuesT_2**)attrValues_);
   cb->avd_imm_status = AVD_IMM_INIT_DONE;
 
   if ((rc == SA_AIS_OK) || (rc == SA_AIS_ERR_EXIST)) {
@@ -208,29 +208,7 @@ done:
 
 //
 ImmObjCreate::~ImmObjCreate() {
-  unsigned int i, j;
-
-  for (i = 0; attrValues_[i] != nullptr; i++) {
-SaImmAttrValuesT_2 *attrValue = (SaImmAttrValuesT_2 *)attrValues_[i];
-
-if (attrValue->attrValueType == SA_IMM_ATTR_SASTRINGT) {
-  for (j = 0; j < attrValue->attrValuesNumber; j++) {
-char *p = *((char **)attrValue->attrValues[j]);
-delete[] p;
-  }
-} else if (attrValue->attrValueType == SA_IMM_ATTR_SANAMET) {
-  for (j = 0; j < attrValue->attrValuesNumber; j++) {
-SaNameT *name = reinterpret_cast(attrValue->attrValues[i]);
-osaf_extended_name_free(name);
-  }
-}
-delete[] attrValue->attrName;
-delete[] static_cast(
-attrValue->attrValues[0]);  // free blob shared by all values
-delete[] attrValue->attrValues;
-delete attrValue;
-  }
-
+  immutil_freeSaImmAttrValuesT(attrValues_);
   delete[] className_;
   delete[] attrValues_;
 }
@@ -630,110 +608,18 @@ typedef struct avd_ccb_apply_ordered_list {
 
 static AvdCcbApplyOrderedListT *ccb_apply_list;
 
-/* 
- *   FUNCTION PROTOTYPES
- * 
- */
-
-static size_t value_size(SaImmValueTypeT attrValueType) {
-  size_t valueSize = 0;
-
-  switch (attrValueType) {
-case SA_IMM_ATTR_SAINT32T:
-  valueSize = sizeof(SaInt32T);
-  break;
-case SA_IMM_ATTR_SAUINT32T:
-  valueSize = sizeof(SaUint32T);
-  break;
-case SA_IMM_ATTR_SAINT64T:
-  valueSize = sizeof(SaInt64T);
-  break;
-case SA_IMM_ATTR_SAUINT64T:
-  valueSize = sizeof(SaUint64T);
-  break;
-case SA_IMM_ATTR_SATIMET:
-  valueSize = sizeof(SaTimeT);
-  break;
-case SA_IMM_ATTR_SANAMET:
-  valueSize = sizeof(SaNameT);
-  break;
-case SA_IMM_ATTR_SAFLOATT:
-  valueSize = sizeof(SaFloatT);
-  break;
-case SA_IMM_ATTR_SADOUBLET:
-  valueSize = sizeof(SaDoubleT);
-  break;
-case SA_IMM_ATTR_SASTRINGT:
-  valueSize = sizeof(SaStringT);
-  break;
-case SA_IMM_ATTR_SAANYT:
-  osafassert(0);
-  break;
-  }
-
-  return valueSize;
-}
-
-static void copySaImmAttrValuesT(SaImmAttrValuesT_2 *copy,
- const SaImmAttrValuesT_2 *original) {
-  size_t valueSize = 0;
-  unsigned int i, valueCount = original->attrValuesNumber;
-  char *databuffer;
-
-  copy->attrName = StrDup(original->attrName);
-
-  copy->attrValuesNumber = valueCount;
-  copy->attrValueType = original->attrValueType;
-  if (valueCount == 0) return; /* (just in case...) */
-
-  copy->attrValues = new SaImmAttrValueT[valueCount];
-
-  valueSize = value_size(original->attrValueType);
-
-  // alloc blob shared by all values
-  databuffer = new char[valueCount * valueSize];
-
-  for (i = 0; i < valueCount; i++) {
-copy->attrValues[i] = databuffer;
-if (original->attrValueType == SA_IMM_ATTR_SASTRINGT) {
-  char *cporig = *((char **)original->attrValues[i]);
-  char **cpp = (char **)databuffer;
-  *cpp = StrDup(cporig);
-} else if (original->attrValueType == SA_IMM_ATTR_SANAMET) {
-  SaNameT *orig = reinterpret_cast(original->attrValues[i]);
-  SaNameT *dest = reinterpret_cast(databuffer);
-  osaf_extended_name_alloc(osaf_extended_name_borrow(orig), dest);
-} else {
-  memcpy(databuffer, original->attrValues[i], valueSize);
-}
-databuffer += valueSize;
-  }
-}
-
-static const SaImmAttrValuesT_2 *dupSaImmAttrValuesT(
-const SaImmAttrValuesT_2 *original) {
-  SaImmAttrValuesT_2 *copy = new SaImmAttrValuesT_2;
-
-  copySaImmAttrValuesT(copy, original);
-  return copy;
-}
-
-static const SaImmAttrValuesT_2 **dupSaImmAttrValuesT_array(
-const SaImmAttrValuesT_2 **original) {
-  const SaImmAttrValuesT_2 **copy;
-  unsigned int i, alen = 0;
-
-  if (original == nullptr) return nullptr;
-
-  while (original[alen] != nullptr) alen++;
-

[devel] [PATCH 0/1] Review Request for log: check log record valid before write in FlushFrontElement [#3212]

2020-08-09 Thread thuan.tran
Summary: log: check log record valid before write in FlushFrontElement [#3212]
Review request for Ticket(s): 3212
Peer Reviewer(s): Thien, Thang, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3212
Base revision: 0a47ca7c0e99c68f18115b58fa39ffb3e59aec48
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision f7f3b5ac2365fba0e085eca9d2e1d18eddaf83f5
Author: thuan.tran 
Date:   Thu, 6 Aug 2020 00:48:10 +0700

log: check log record valid before write in FlushFrontElement [#3212]



Complete diffstat:
--
 src/log/logd/lgs_cache.cc | 31 +--
 src/log/logd/lgs_cache.h  |  2 +-
 2 files changed, 18 insertions(+), 15 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
N/A

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] log: check log record valid before write in FlushFrontElement [#3212]

2020-08-09 Thread thuan.tran
---
 src/log/logd/lgs_cache.cc | 31 +--
 src/log/logd/lgs_cache.h  |  2 +-
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/src/log/logd/lgs_cache.cc b/src/log/logd/lgs_cache.cc
index 27e33702d..f30a23899 100644
--- a/src/log/logd/lgs_cache.cc
+++ b/src/log/logd/lgs_cache.cc
@@ -180,16 +180,21 @@ bool Cache::Data::is_overdue() const {
   return (current - queue_at > max_resilience);
 }
 
-bool Cache::Data::is_valid(std::string* reason) const {
+bool Cache::Data::is_valid() const {
+  std::string reason{"Ok"};
+  bool rc = true;
   if (is_stream_open() == false) {
-*reason = "the log stream has been closed";
-return false;
+reason = "the log stream has been closed";
+rc = false;
+  } else if (is_overdue() == true) {
+reason = "the record is overdue (stream: " + param_->stream()->name + ")";
+rc = false;
   }
-  if (is_overdue() == true) {
-*reason = "the record is overdue (stream: " + param_->stream()->name + ")";
-return false;
+  if ((rc == false) && (is_client_alive() == false)) {
+LOG_NO("Drop the invalid log record, reason: %s", reason.c_str());
+LOG_NO("The record info: %s", log_record_);
   }
-  return true;
+  return rc;
 }
 
 void Cache::Data::CloneData(CkptPushAsync* output) const {
@@ -413,13 +418,7 @@ void Cache::PopOverdueData() {
   if (Empty() == true || is_active() == false) return;
   std::string reason{"Ok"};
   auto data = Front();
-  if (data->is_valid() == false) {
-// Either the targeting stream has been closed or the owner is dead.
-// syslog the detailed info about dropped log record if latter case.
-if (data->is_client_alive() == false) {
-  LOG_NO("Drop the invalid log record, reason: %s", reason.c_str());
-  LOG_NO("The record info: %s", data->record());
-}
+  if (data->is_valid() == false) {
 Pop(false);
   }
 }
@@ -427,6 +426,10 @@ void Cache::PopOverdueData() {
 void Cache::FlushFrontElement() {
   if (Empty() || !is_active() || !is_iothread_ready()) return;
   auto data = Front();
+  if (data->is_valid() == false) {
+Pop(false);
+return;
+  }
   int rc = data->Write();
   // Write still gets timeout, do nothing.
   if ((rc == -1) || (rc == -2)) return;
diff --git a/src/log/logd/lgs_cache.h b/src/log/logd/lgs_cache.h
index a5d6181fb..d044bc240 100644
--- a/src/log/logd/lgs_cache.h
+++ b/src/log/logd/lgs_cache.h
@@ -167,7 +167,7 @@ class Cache {
 // Check if the data is valid or not. The data is not valid if either
 // the targeting stream is closed or the the time of its staying in the
 // queue is reaching the maximum.
-bool is_valid(std::string* reason) const;
+bool is_valid() const;
 // Dump the values of data's attributes.
 void Dump() const;
 // Clone values of my attributes to `CkptPushAsync`; and CkptPushAsync
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for nid: fix opensafd fail to start under gcov enabled [#3209]

2020-08-05 Thread thuan.tran
Summary: nid: fix opensafd fail to start under gcov enabled [#3209]
Review request for Ticket(s): 3209
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3209
Base revision: 3367dc57a0df9de1a02c1a6c57ad4e83cb834bdc
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 504ac1827a2d9ca8d902cdb46d8e948b880a82b4
Author: thuan.tran 
Date:   Tue, 4 Aug 2020 14:53:21 +0700

nid: fix opensafd fail to start under gcov enabled [#3209]

- Fix amfd/sgproc.cc cause compile failed when configure enable gcov.
- Waiting svc_monitor_thread ready in create_svc_monitor_thread to avoid
lost svc_mon_thr_fd value which later cause opensafd fail to start.



Complete diffstat:
--
 src/amf/amfd/sgproc.cc |  3 +--
 src/nid/nodeinit.cc| 19 +--
 2 files changed, 10 insertions(+), 12 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] nid: fix opensafd fail to start under gcov enabled [#3209]

2020-08-05 Thread thuan.tran
- Fix amfd/sgproc.cc cause compile failed when configure enable gcov.
- Waiting svc_monitor_thread ready in create_svc_monitor_thread to avoid
lost svc_mon_thr_fd value which later cause opensafd fail to start.
---
 src/amf/amfd/sgproc.cc |  3 +--
 src/nid/nodeinit.cc| 19 +--
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 78ccb31f9..405e2c45d 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -2624,9 +2624,8 @@ static uint32_t shutdown_contained_sus(AVD_CL_CB *cb, 
AVD_SU *container_su,
   }
 
 done:
-  return rc;
-
   TRACE_LEAVE();
+  return rc;
 }
 
 /*
diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc
index d5b4eb20a..548c7fb46 100644
--- a/src/nid/nodeinit.cc
+++ b/src/nid/nodeinit.cc
@@ -1612,6 +1612,15 @@ uint32_t create_svc_monitor_thread(void) {
 return NCSCC_RC_FAILURE;
   }
 
+  // Waiting until svc_monitor_thread is up and in ready state.
+  unsigned no_repeat = 0;
+  while (svc_monitor_thread_ready == false && no_repeat < 100) {
+osaf_nanosleep();
+no_repeat++;
+  }
+  osafassert(svc_monitor_thread_ready);
+  LOG_NO("svc_monitor_thread is up and in ready state");
+
   TRACE_LEAVE();
   return NCSCC_RC_SUCCESS;
 }
@@ -1662,16 +1671,6 @@ int main(int argc, char *argv[]) {
 exit(EXIT_FAILURE);
   }
 
-  // Waiting until svc_monitor_thread is up and in ready state.
-  unsigned no_repeat = 0;
-  while (svc_monitor_thread_ready == false && no_repeat < 100) {
-osaf_nanosleep();
-no_repeat++;
-  }
-
-  osafassert(svc_monitor_thread_ready);
-  LOG_NO("svc_monitor_thread is up and in ready state");
-
   if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) {
 LOG_ER("Failed to parse file %s. Exiting", sbuf);
 exit(EXIT_FAILURE);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mbc: fix memleak detected by valgrind [#3208]

2020-08-03 Thread thuan.tran
Summary: mbc: fix memleak detected by valgrind [#3208]
Review request for Ticket(s): 3208
Peer Reviewer(s): Minh, Thang, Thanh, Thien
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3208
Base revision: 3367dc57a0df9de1a02c1a6c57ad4e83cb834bdc
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 7eb1a45c3c95f8a3b8612265f7ec603028fcbf71
Author: thuan.tran 
Date:   Mon, 3 Aug 2020 15:30:24 +0700

mbc: fix memleak detected by valgrind [#3208]



Complete diffstat:
--
 src/mbc/mbcsv_act.c |  5 +
 src/mbc/mbcsv_evt_msg.h |  1 +
 src/mbc/mbcsv_mds.c | 15 ---
 src/mbc/mbcsv_pr_evts.c |  3 ++-
 src/mbc/mbcsv_queue.c   | 13 +
 src/mbc/mbcsv_util.c| 18 --
 6 files changed, 41 insertions(+), 14 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mbc: fix memleak detected by valgrind [#3208]

2020-08-03 Thread thuan.tran
---
 src/mbc/mbcsv_act.c |  5 +
 src/mbc/mbcsv_evt_msg.h |  1 +
 src/mbc/mbcsv_mds.c | 15 ---
 src/mbc/mbcsv_pr_evts.c |  3 ++-
 src/mbc/mbcsv_queue.c   | 13 +
 src/mbc/mbcsv_util.c| 18 --
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/src/mbc/mbcsv_act.c b/src/mbc/mbcsv_act.c
index d58f502d2..ae0e8e539 100644
--- a/src/mbc/mbcsv_act.c
+++ b/src/mbc/mbcsv_act.c
@@ -77,6 +77,11 @@ void ncs_mbcsv_null_func(PEER_INST *peer, MBCSV_EVT *evt)
  peer->my_ckpt_inst->my_role,
  peer->my_ckpt_inst->my_mbcsv_inst->svc_id,
  peer->my_ckpt_inst->pwe_hdl);
+   if ((evt) && (evt->msg_type == MBCSV_EVT_INTERNAL_RCV) &&
+   (evt->info.peer_msg.type == MBCSV_EVT_INTERNAL_CLIENT)) {
+   m_MMGR_FREE_BUFR_LIST(
+   evt->info.peer_msg.info.client_msg.uba.ub);
+   }
 }
 
 /*
diff --git a/src/mbc/mbcsv_evt_msg.h b/src/mbc/mbcsv_evt_msg.h
index 9eef74713..b9addb3a4 100644
--- a/src/mbc/mbcsv_evt_msg.h
+++ b/src/mbc/mbcsv_evt_msg.h
@@ -49,6 +49,7 @@ typedef enum {
   MBCSV_EVT_MDS_SUBSCR,
   MBCSV_EVT_TMR,
   MBCSV_EVT_INTERNAL,
+  MBCSV_EVT_INTERNAL_RCV,
 } MBCSV_EVT_TYPE;
 
 typedef struct mbcsv_evt_tmr_info {
diff --git a/src/mbc/mbcsv_mds.c b/src/mbc/mbcsv_mds.c
index 964e33542..afaf1fd1b 100644
--- a/src/mbc/mbcsv_mds.c
+++ b/src/mbc/mbcsv_mds.c
@@ -193,6 +193,7 @@ void mbcsv_mds_unreg(uint32_t pwe_hdl)
 uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT *msg, CKPT_INST 
*ckpt,
MBCSV_ANCHOR anchor)
 {
+   uint32_t rc;
NCSMDS_INFO mds_info;
TRACE_ENTER2("sending to vdest:%" PRIx64, ckpt->my_vdest);
 
@@ -241,7 +242,7 @@ uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT 
*msg, CKPT_INST *ckpt,
return NCSCC_RC_FAILURE;
}
 
-   if (ncsmds_api(_info) == NCSCC_RC_SUCCESS) {
+   if ((rc = ncsmds_api(_info)) == NCSCC_RC_SUCCESS) {
/* If message is send resp  then free the message received in
 * response  */
if ((MDS_SENDTYPE_REDRSP == send_type) &&
@@ -253,7 +254,7 @@ uint32_t mbcsv_mds_send_msg(uint32_t send_type, MBCSV_EVT 
*msg, CKPT_INST *ckpt,
return NCSCC_RC_SUCCESS;
} else {
TRACE_LEAVE2("failure");
-   return NCSCC_RC_FAILURE;
+   return rc;
}
 }
 
@@ -379,7 +380,7 @@ uint32_t mbcsv_mds_rcv(NCSMDS_CALLBACK_INFO *cbinfo)
 * We found out the mailbox to which we can post a message. Now
 * construct and send this message to the mailbox.
 */
-   msg->msg_type = MBCSV_EVT_INTERNAL;
+   msg->msg_type = MBCSV_EVT_INTERNAL_RCV;
 
if (msg->info.peer_msg.type == MBCSV_EVT_INTERNAL_PEER_DISC) {
send_pri = NCS_IPC_PRIORITY_HIGH;
@@ -387,11 +388,19 @@ uint32_t mbcsv_mds_rcv(NCSMDS_CALLBACK_INFO *cbinfo)
send_pri = NCS_IPC_PRIORITY_NORMAL;
 
if (NCSCC_RC_SUCCESS != m_MBCSV_SND_MSG(, msg, send_pri)) {
+   if (msg->info.peer_msg.type == 
MBCSV_EVT_INTERNAL_CLIENT) {
+   m_MMGR_FREE_BUFR_LIST(
+   msg->info.peer_msg.info.client_msg.uba.ub);
+   }
m_MMGR_FREE_MBCSV_EVT(msg);
TRACE_LEAVE2("ipc send failed");
return NCSCC_RC_FAILURE;
}
} else {
+   if (msg->info.peer_msg.type == MBCSV_EVT_INTERNAL_CLIENT) {
+   m_MMGR_FREE_BUFR_LIST(
+   msg->info.peer_msg.info.client_msg.uba.ub);
+   }
m_MMGR_FREE_MBCSV_EVT(msg);
}
 
diff --git a/src/mbc/mbcsv_pr_evts.c b/src/mbc/mbcsv_pr_evts.c
index 0deb5f25b..72dce61b5 100644
--- a/src/mbc/mbcsv_pr_evts.c
+++ b/src/mbc/mbcsv_pr_evts.c
@@ -147,7 +147,8 @@ uint32_t mbcsv_process_events(MBCSV_EVT *rcvd_evt, uint32_t 
mbcsv_hdl)
goto pr_done;
}
} break;
-   case MBCSV_EVT_INTERNAL: {
+   case MBCSV_EVT_INTERNAL:
+   case MBCSV_EVT_INTERNAL_RCV: {
/*
 * Process all the received events.
 */
diff --git a/src/mbc/mbcsv_queue.c b/src/mbc/mbcsv_queue.c
index c0a41fce0..25ae13f5f 100644
--- a/src/mbc/mbcsv_queue.c
+++ b/src/mbc/mbcsv_queue.c
@@ -77,12 +77,17 @@ uint32_t mbcsv_client_queue_init(MBCSV_REG *mbc_reg)
 **/
 bool mbcsv_client_cleanup_mbx(NCSCONTEXT arg, NCSCONTEXT msg)
 {
-   MBCSV_EVT *node = (MBCSV_EVT *)msg;
+   MBCSV_EVT *evt = (MBCSV_EVT *)msg;
TRACE_ENTER();
 
-   /* deallocate the nodes */
-   if (NULL != node) {
-   

[devel] [PATCH 1/3] amf: enhance to work in roaming SC and headless [#2936]

2020-07-10 Thread thuan.tran
- amfd reset msg id counter for node that ignore amfnd down
event to avoid nodes reboot once more due to mismatch msg id after
reboot up from reboot order for sending node_up after sync window.

- amfd active order reboot its standby if it detect another
active amfd (multi partition cluster rejoin). Two actives will be
handled by RDE detect split-brain.

- amfd standby should reboot itself if see two active peers to
avoid standby do cold-sync or be updated with wrong active.
Two actives will be handled by RDE detect split-brain.

- amfd just become standby (out of sync) but see active down
should reboot itself.
---
 src/amf/amfd/dmsg.cc   |  8 
 src/amf/amfd/evt.h |  1 +
 src/amf/amfd/main.cc   |  3 +++
 src/amf/amfd/mds.cc| 36 ++--
 src/amf/amfd/msg.h |  1 +
 src/amf/amfd/ndfsm.cc  |  2 ++
 src/amf/amfd/proc.h|  1 +
 src/amf/amfd/role.cc   | 27 +++
 src/amf/amfd/util.cc   |  2 +-
 src/amf/amfnd/amfnd.cc |  2 +-
 10 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/dmsg.cc b/src/amf/amfd/dmsg.cc
index cf4019d8a..c450b649e 100644
--- a/src/amf/amfd/dmsg.cc
+++ b/src/amf/amfd/dmsg.cc
@@ -75,6 +75,8 @@ void avd_mds_d_enc(MDS_CALLBACK_ENC_INFO *enc_info) {
   ncs_encode_32bit(, msg->msg_info.d2d_chg_role_rsp.role);
   ncs_encode_32bit(, msg->msg_info.d2d_chg_role_rsp.status);
   break;
+case AVD_D2D_REBOOT:
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, msg->msg_type);
   break;
@@ -120,6 +122,8 @@ void avd_mds_d_dec(MDS_CALLBACK_DEC_INFO *dec_info) {
   static_cast(ncs_decode_32bit());
   d2d_msg->msg_info.d2d_chg_role_rsp.status = ncs_decode_32bit();
   break;
+case AVD_D2D_REBOOT:
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, d2d_msg->msg_type);
   break;
@@ -210,6 +214,10 @@ uint32_t avd_d2d_msg_rcv(AVD_D2D_MSG *rcv_msg) {
 osafassert(0);
   }
   break;
+case AVD_D2D_REBOOT:
+  LOG_ER("Reboot order from Active as roaming SC split-brain detected");
+  opensaf_quick_reboot("Split-brain detected");
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, rcv_msg->msg_type);
   break;
diff --git a/src/amf/amfd/evt.h b/src/amf/amfd/evt.h
index a9028cde3..a08dccebb 100644
--- a/src/amf/amfd/evt.h
+++ b/src/amf/amfd/evt.h
@@ -72,6 +72,7 @@ typedef enum avd_evt_type {
   AVD_IMM_REINITIALIZED,
   AVD_EVT_UNASSIGN_SI_DEP_STATE,
   AVD_EVT_ND_MDS_VER_INFO,
+  AVD_EVT_ROAMING_SC_SPLITBRAIN,
   AVD_EVT_MAX
 } AVD_EVT_TYPE;
 
diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 3b1536721..3cc0d9741 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -132,6 +132,9 @@ static const AVD_EVT_HDLR g_actv_list[AVD_EVT_MAX] = {
 invalid_evh,/* AVD_EVT_INVALID */
 avd_sidep_unassign_evh, /* AVD_EVT_UNASSIGN_SI_DEP_STATE */
 avd_avnd_mds_info_evh,  /* AVD_EVT_ND_MDS_VER_INFO*/
+
+/* Roaming SC split-brain processing */
+avd_roaming_sc_split_brain_evh, /* AVD_EVT_ROAMING_SC_SPLITBRAIN */
 };
 
 /* list of all the function pointers related to handling the events
diff --git a/src/amf/amfd/mds.cc b/src/amf/amfd/mds.cc
index 108f9b8bd..34ed96586 100644
--- a/src/amf/amfd/mds.cc
+++ b/src/amf/amfd/mds.cc
@@ -400,9 +400,11 @@ static uint32_t 
avd_mds_svc_evt(MDS_CALLBACK_SVC_EVENT_INFO *evt_info) {
 case NCSMDS_UP:
   switch (evt_info->i_svc_id) {
 case NCSMDS_SVC_ID_AVD:
+  TRACE("NCSMDS_UP AVD %x", evt_info->i_node_id);
   /* if((Is this up from other node) && (Is this Up from an Adest)) */
   if ((evt_info->i_node_id != cb->node_id_avd) &&
-  (m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest))) {
+  (m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest)) &&
+  (cb->other_avd_adest == 0)) {
 cb->node_id_avd_other = evt_info->i_node_id;
 cb->other_avd_adest = evt_info->i_dest;
 cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC;
@@ -441,14 +443,44 @@ static uint32_t 
avd_mds_svc_evt(MDS_CALLBACK_SVC_EVENT_INFO *evt_info) {
   }
   break;
 
+case NCSMDS_RED_UP:
+  if ((evt_info->i_svc_id == NCSMDS_SVC_ID_AVD) &&
+  (evt_info->i_role == V_DEST_RL_ACTIVE) &&
+  (cb->node_id_avd != evt_info->i_node_id) &&
+  (cb->other_avd_adest) &&
+  (cb->node_id_avd_other != evt_info->i_node_id)) {
+if (cb->avail_state_avd == SA_AMF_HA_STANDBY) {
+  LOG_ER("Standby peer see two peers: %x and %x",
+cb->node_id_avd_other, evt_info->i_node_id);
+  opensaf_reboot(0, NULL, "Standby peer see two peers");
+} else if (cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
+  // Send reboot order to known standby (multi clusters rejoin)
+  AVD_EVT *evt = new AVD_EVT();
+  evt->rcv_evt = AVD_EVT_ROAMING_SC_SPLITBRAIN;
+  if 

[devel] [PATCH 2/3] imm: reboot nodes used to be different partition with coord [#2936]

2020-07-10 Thread thuan.tran
- immnd send re-introduce refresh=3 with ex-immd (active) node id.
- immd set very high priority for re-introduce msg of local immnd
and choose coord if re-introduce refresh=3 from local immnd.
- immd reply re-intro to reboot if ex-immd is not same as ex-immd
of selected coord.
- immd use new INTRO_RSP_2 to checkpoint ex-immd to standby.
- immnd use MDS_RED_SUBSCRIBE for immd to know active/standby immd
and help detect headless in multi partition clusters rejoin.
- immnd discard FEVS from unknown immd or during re-introduce to
avoid immnd OUT OF ORDER restart and lost ex-immd info.
- Update README.SC_ABSENCE for this new feature.
- Allow to configure disable/enable this new feature.
- immd standby will reboot if see two actives immd to avoid sync
with wrong active.
---
 scripts/opensaf_reboot |  1 +
 src/imm/README.SC_ABSENCE  | 22 ++
 src/imm/common/immsv_evt.c | 17 +++-
 src/imm/common/immsv_evt.h |  4 ++
 src/imm/immd/immd.conf |  7 
 src/imm/immd/immd_cb.h |  7 
 src/imm/immd/immd_evt.c| 86 --
 src/imm/immd/immd_main.c   |  9 
 src/imm/immd/immd_mbcsv.c  | 24 +--
 src/imm/immd/immd_mds.c| 17 +---
 src/imm/immd/immd_proc.c   | 15 ---
 src/imm/immd/immd_red.h|  1 +
 src/imm/immd/immd_sbevt.c  |  9 +++-
 src/imm/immnd/immnd_cb.h   |  4 ++
 src/imm/immnd/immnd_evt.c  | 84 +
 src/imm/immnd/immnd_main.c |  2 +
 src/imm/immnd/immnd_mds.c  | 35 
 src/imm/immnd/immnd_proc.c | 19 -
 18 files changed, 296 insertions(+), 67 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index bcbc689f0..bb3cee5a1 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -143,6 +143,7 @@ unset tipc
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
$icmd pkill -STOP osafamfd
+   $icmd pkill -STOP osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/README.SC_ABSENCE b/src/imm/README.SC_ABSENCE
index 9cae5d519..04aa7ef38 100644
--- a/src/imm/README.SC_ABSENCE
+++ b/src/imm/README.SC_ABSENCE
@@ -76,3 +76,25 @@ Support for absent IMMD is incompatible with 2PBE. If both 
are configured then
 2PBE will win and the absence of IMMD feature will be ignored. An error message
 is printed in this case to the syslog at startup.
 
+
+SC ABSENCE and ROAMING SC
+=
+When both SC Absence and Roaming SC features are enabled, multiple partitioned
+clusters can occur due to network split. If PBE database is configured on local
+node then many diverted IMM databases can occur. If rejoining these partitioned
+clusters into one cluster, any undefined behavior may happen. To avoid this,
+IMM implements mechanism to reboot nodes used to be on different partition with
+selected coordinator [#2936]
+
+- IMMND sends re-introduce using refresh id 3 with ex-IMMD node id.
+- When a payload becomes controller, the IMMD will select IMMND coordinator
+(prioritize local IMMND) and send the reply message to reboot nodes which have
+ex-IMMD node id different from ex-IMMD of selected coordinator.
+- Active IMMD uses new IMMD_A2S_MSG_INTRO_RSP_2 to checkpoint node info with
+ex-IMMD to standby IMMD.
+- IMMND uses MDS_RED_SUBSCRIBE to know Active/Standby in order to discard FEVS
+from unknown IMMD or while waiting the acceptance of re-introduce message to
+avoid IMMND itself restarted due to OUT OR ORDER. This mechanism is also
+applicable for rejoining multiple headless partitions from network split.
+
+To enable this mechanism, please export IMMSV_COORD_SELECT_NODE=1 in immd.conf
diff --git a/src/imm/common/immsv_evt.c b/src/imm/common/immsv_evt.c
index c93f82a0f..1c43ec719 100644
--- a/src/imm/common/immsv_evt.c
+++ b/src/imm/common/immsv_evt.c
@@ -3395,7 +3395,7 @@ static uint32_t immsv_evt_enc_toplevel(IMMSV_EVT *i_evt, 
NCS_UBAID *o_ub)
 * sublevel */
}
 
-   if ((immdevt->info.ctrl_msg.refresh == 2) &&
+   if ((immdevt->info.ctrl_msg.refresh >= 2) &&
(immdevt->type ==
 IMMD_EVT_ND2D_INTRO)) { /* Intro after IMMD
restart. */
@@ -3419,6 +3419,12 @@ static uint32_t immsv_evt_enc_toplevel(IMMSV_EVT *i_evt, 
NCS_UBAID *o_ub)
ncs_encode_32bit(
, immdevt->info.ctrl_msg.impl_count);
ncs_enc_claim_space(o_ub, 4);
+   if (immdevt->info.ctrl_msg.refresh == 3) {
+   IMMSV_RSRV_SPACE_ASSERT(p8, o_ub, 4);
+   ncs_encode_32bit(
+   , 
immdevt->info.ctrl_msg.ex_immd_node_id);
+   

[devel] [PATCH 3/3] imm: define macro for values of canBeCoord [#2936]

2020-07-10 Thread thuan.tran
---
 src/imm/common/immsv_evt.h | 11 +++-
 src/imm/immd/immd.h| 10 +++
 src/imm/immd/immd_evt.c| 57 ++
 src/imm/immd/immd_proc.c   | 11 ++--
 src/imm/immd/immd_sbevt.c  | 15 +-
 src/imm/immnd/immnd_evt.c  |  8 +++---
 6 files changed, 49 insertions(+), 63 deletions(-)

diff --git a/src/imm/common/immsv_evt.h b/src/imm/common/immsv_evt.h
index ec45ca486..374c351f5 100644
--- a/src/imm/common/immsv_evt.h
+++ b/src/imm/common/immsv_evt.h
@@ -437,12 +437,21 @@ typedef struct immsv_d2nd_ccbinit {
   IMMSV_OM_CCB_INITIALIZE i;
 } IMMSV_D2ND_CCBINIT;
 
+typedef enum immsv_coord_type {
+  IMMSV_NOT_COORD = 0, /* payload cannot be coordinator, except headless */
+  IMMSV_SC_COORD = 1, /* controller to be coordinator */
+  IMMSV_2PBE_PRELOAD = 2,
+  IMMSV_2PBE_SYNC = 3,
+  IMMSV_VETERAN_COORD = 4, /* veteran node after headless can be coordinator */
+  IMMSV_UNKNOWN = 5 /* Unknown node will be ordered reboot */
+} IMMSV_COORD_TYPE;
+
 typedef struct immsv_d2nd_control {
   SaUint32T nodeId;
   SaUint32T rulingEpoch;
   SaUint64T fevsMsgStart;
   SaUint32T ndExecPid;
-  uint8_t canBeCoord; /* 0=>payload; 1=>SC; 2=>2PBE_preload; 3=>2PBE_sync*/
+  IMMSV_COORD_TYPE canBeCoord;
   uint8_t isCoord;
   uint8_t syncStarted;
   SaUint32T nodeEpoch;
diff --git a/src/imm/immd/immd.h b/src/imm/immd/immd.h
index 7dc1da686..421de4507 100644
--- a/src/imm/immd/immd.h
+++ b/src/imm/immd/immd.h
@@ -46,4 +46,14 @@ IMMD_CB *immd_cb;
 
 extern uint32_t initialize_for_assignment(IMMD_CB *cb, SaAmfHAStateT ha_state);
 
+static inline void set_canBeCoord_and_execPid(IMMSV_EVT *evt,
+   IMMD_CB *cb, IMMD_IMMND_INFO_NODE *node_info) {
+evt->info.immnd.info.ctrl.canBeCoord =
+(node_info->isOnController) ? IMMSV_SC_COORD :
+(cb->mScAbsenceAllowed) ? IMMSV_VETERAN_COORD : IMMSV_NOT_COORD;
+evt->info.immnd.info.ctrl.ndExecPid =
+(evt->info.immnd.info.ctrl.canBeCoord == IMMSV_VETERAN_COORD) ?
+(cb->mScAbsenceAllowed): node_info->immnd_execPid;
+}
+
 #endif  // IMM_IMMD_IMMD_H_
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 83831258f..16e8cbaf0 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -448,12 +448,7 @@ static void immd_start_sync_ok(IMMD_CB *cb, SaUint32T 
rulingEpoch,
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -503,12 +498,7 @@ static void immd_abort_sync_ok(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info)
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -553,12 +543,7 @@ static void immd_prto_purge_mutations(IMMD_CB *cb,
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -590,12 +575,7 @@ static int immd_dump_ok(IMMD_CB *cb, SaUint32T 

[devel] [PATCH 0/3] Review Request for imm: reboot nodes used to be different partition with coord [#2936] V2

2020-07-10 Thread thuan.tran
Summary: amf: enhance to work in roaming SC and headless [#2936]
Review request for Ticket(s): 2936
Peer Reviewer(s): Minh, Thang, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2936
Base revision: b0086e3c5da87fad844e76c8c648f6dc6e7ae73a
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision a3ac6a80f584bb6f3cfa8e30b8475916a65c7e4e
Author: thuan.tran 
Date:   Fri, 10 Jul 2020 12:55:00 +0700

imm: define macro for values of canBeCoord [#2936]



revision 975986be3d8d0a6ef2e54c47532aebbebeafc4da
Author: thuan.tran 
Date:   Fri, 10 Jul 2020 12:55:00 +0700

imm: reboot nodes used to be different partition with coord [#2936]

- immnd send re-introduce refresh=3 with ex-immd (active) node id.
- immd set very high priority for re-introduce msg of local immnd
and choose coord if re-introduce refresh=3 from local immnd.
- immd reply re-intro to reboot if ex-immd is not same as ex-immd
of selected coord.
- immd use new INTRO_RSP_2 to checkpoint ex-immd to standby.
- immnd use MDS_RED_SUBSCRIBE for immd to know active/standby immd
and help detect headless in multi partition clusters rejoin.
- immnd discard FEVS from unknown immd or during re-introduce to
avoid immnd OUT OF ORDER restart and lost ex-immd info.
- Update README.SC_ABSENCE for this new feature.
- Allow to configure disable/enable this new feature.
- immd standby will reboot if see two actives immd to avoid sync
with wrong active.



revision 1754b0bdb1237441c77de2c8454dd6604c3bae60
Author: thuan.tran 
Date:   Fri, 10 Jul 2020 12:55:00 +0700

amf: enhance to work in roaming SC and headless [#2936]

- amfd reset msg id counter for node that ignore amfnd down
event to avoid nodes reboot once more due to mismatch msg id after
reboot up from reboot order for sending node_up after sync window.

- amfd active order reboot its standby if it detect another
active amfd (multi partition cluster rejoin). Two actives will be
handled by RDE detect split-brain.

- amfd standby should reboot itself if see two active peers to
avoid standby do cold-sync or be updated with wrong active.
Two actives will be handled by RDE detect split-brain.

- amfd just become standby (out of sync) but see active down
should reboot itself.



Complete diffstat:
--
 scripts/opensaf_reboot |   1 +
 src/amf/amfd/dmsg.cc   |   8 +++
 src/amf/amfd/evt.h |   1 +
 src/amf/amfd/main.cc   |   3 +
 src/amf/amfd/mds.cc|  36 +++-
 src/amf/amfd/msg.h |   1 +
 src/amf/amfd/ndfsm.cc  |   2 +
 src/amf/amfd/proc.h|   1 +
 src/amf/amfd/role.cc   |  27 +
 src/amf/amfd/util.cc   |   2 +-
 src/amf/amfnd/amfnd.cc |   2 +-
 src/imm/README.SC_ABSENCE  |  22 +++
 src/imm/common/immsv_evt.c |  17 +-
 src/imm/common/immsv_evt.h |  15 -
 src/imm/immd/immd.conf |   7 +++
 src/imm/immd/immd.h|  10 
 src/imm/immd/immd_cb.h |   7 +++
 src/imm/immd/immd_evt.c| 141 +++--
 src/imm/immd/immd_main.c   |   9 +++
 src/imm/immd/immd_mbcsv.c  |  24 ++--
 src/imm/immd/immd_mds.c|  17 --
 src/imm/immd/immd_proc.c   |  26 -
 src/imm/immd/immd_red.h|   1 +
 src/imm/immd/immd_sbevt.c  |  24 +---
 src/imm/immnd/immnd_cb.h   |   4 ++
 src/imm/immnd/immnd_evt.c  |  88 ++--
 src/imm/immnd/immnd_main.c |   2 +
 src/imm/immnd/immnd_mds.c  |  35 ---
 src/imm/immnd/immnd_proc.c |  19 +++---
 29 files changed, 421 insertions(+), 131 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You ha

[devel] [PATCH 0/2] Review Request for imm: reboot nodes used to be different partition with coord [#2936]

2020-07-03 Thread thuan.tran
Summary: imm: reboot nodes used to be different partition with coord [#2936]
Review request for Ticket(s): 2936
Peer Reviewer(s): Minh, Thang, Vu
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2936
Base revision: d28ee50720d5e57edba6ee5c27e8b2bebb0638fa
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
N/A

revision 40ff2641bbfd5caf8133f9c3a256b1f3268d5f92
Author: thuan.tran 
Date:   Fri, 3 Jul 2020 16:24:47 +0700

imm: define macro for values of canBeCoord [#2936]



revision c1da24ed55b18fd511acbab2e2edac9bb9073dd4
Author: thuan.tran 
Date:   Fri, 3 Jul 2020 16:24:47 +0700

imm: reboot nodes used to be different partition with coord [#2936]

- immnd send re-introduce refresh=3 with ex-immd (active) node id.
- immd set very high priority for re-introduce msg of local immnd
and choose coord if re-introduce refresh=3 from local immnd.
- immd reply re-intro to reboot if ex-immd is not same as ex-immd
of selected coord.
- immd use new INTRO_RSP_2 to checkpoint ex-immd to standby.
- immnd use MDS_RED_SUBSCRIBE for immd to know active/standby immd
and help detect headless in multi partition clusters rejoin.
- immnd discard FEVS from unknown immd or during re-introduce to
avoid immnd OUT OF ORDER restart and lost ex-immd info.
- Update README.SC_ABSENCE for this new feature.
- Allow to configure disable/enable this new feature.
- immd standby will reboot if see two actives immd to avoid sync
with wrong active.



Complete diffstat:
--
 scripts/opensaf_reboot |   1 +
 src/imm/README.SC_ABSENCE  |  21 +++
 src/imm/common/immsv_evt.c |  17 +-
 src/imm/common/immsv_evt.h |  15 -
 src/imm/immd/immd.conf |   7 +++
 src/imm/immd/immd.h|  10 
 src/imm/immd/immd_cb.h |   3 +
 src/imm/immd/immd_evt.c| 141 +++--
 src/imm/immd/immd_main.c   |   9 +++
 src/imm/immd/immd_mbcsv.c  |  24 ++--
 src/imm/immd/immd_mds.c|  17 --
 src/imm/immd/immd_proc.c   |  26 -
 src/imm/immd/immd_red.h|   1 +
 src/imm/immd/immd_sbevt.c  |  24 +---
 src/imm/immnd/immnd_cb.h   |   3 +
 src/imm/immnd/immnd_evt.c  |  88 ++--
 src/imm/immnd/immnd_main.c |   2 +
 src/imm/immnd/immnd_mds.c  |  35 ---
 src/imm/immnd/immnd_proc.c |  19 +++---
 19 files changed, 336 insertions(+), 127 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attach

[devel] [PATCH 2/2] imm: define macro for values of canBeCoord [#2936]

2020-07-03 Thread thuan.tran
---
 src/imm/common/immsv_evt.h | 11 +++-
 src/imm/immd/immd.h| 10 +++
 src/imm/immd/immd_evt.c| 57 ++
 src/imm/immd/immd_proc.c   | 11 ++--
 src/imm/immd/immd_sbevt.c  | 15 +-
 src/imm/immnd/immnd_evt.c  |  8 +++---
 6 files changed, 49 insertions(+), 63 deletions(-)

diff --git a/src/imm/common/immsv_evt.h b/src/imm/common/immsv_evt.h
index 18aeca447..f2571a1bc 100644
--- a/src/imm/common/immsv_evt.h
+++ b/src/imm/common/immsv_evt.h
@@ -437,12 +437,21 @@ typedef struct immsv_d2nd_ccbinit {
   IMMSV_OM_CCB_INITIALIZE i;
 } IMMSV_D2ND_CCBINIT;
 
+typedef enum immsv_coord_type {
+  IMMSV_NOT_COORD = 0, /* payload cannot be coordinator, except headless */
+  IMMSV_SC_COORD = 1, /* controller to be coordinator */
+  IMMSV_2PBE_PRELOAD = 2,
+  IMMSV_2PBE_SYNC = 3,
+  IMMSV_VETERAN_COORD = 4, /* veteran node after headless can be coordinator */
+  IMMSV_UNKNOWN = 5 /* Unknown node will be ordered reboot */
+} IMMSV_COORD_TYPE;
+
 typedef struct immsv_d2nd_control {
   SaUint32T nodeId;
   SaUint32T rulingEpoch;
   SaUint64T fevsMsgStart;
   SaUint32T ndExecPid;
-  uint8_t canBeCoord; /* 0=>payload; 1=>SC; 2=>2PBE_preload; 3=>2PBE_sync*/
+  IMMSV_COORD_TYPE canBeCoord;
   uint8_t isCoord;
   uint8_t syncStarted;
   SaUint32T nodeEpoch;
diff --git a/src/imm/immd/immd.h b/src/imm/immd/immd.h
index 7dc1da686..421de4507 100644
--- a/src/imm/immd/immd.h
+++ b/src/imm/immd/immd.h
@@ -46,4 +46,14 @@ IMMD_CB *immd_cb;
 
 extern uint32_t initialize_for_assignment(IMMD_CB *cb, SaAmfHAStateT ha_state);
 
+static inline void set_canBeCoord_and_execPid(IMMSV_EVT *evt,
+   IMMD_CB *cb, IMMD_IMMND_INFO_NODE *node_info) {
+evt->info.immnd.info.ctrl.canBeCoord =
+(node_info->isOnController) ? IMMSV_SC_COORD :
+(cb->mScAbsenceAllowed) ? IMMSV_VETERAN_COORD : IMMSV_NOT_COORD;
+evt->info.immnd.info.ctrl.ndExecPid =
+(evt->info.immnd.info.ctrl.canBeCoord == IMMSV_VETERAN_COORD) ?
+(cb->mScAbsenceAllowed): node_info->immnd_execPid;
+}
+
 #endif  // IMM_IMMD_IMMD_H_
diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
index 83831258f..16e8cbaf0 100644
--- a/src/imm/immd/immd_evt.c
+++ b/src/imm/immd/immd_evt.c
@@ -448,12 +448,7 @@ static void immd_start_sync_ok(IMMD_CB *cb, SaUint32T 
rulingEpoch,
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -503,12 +498,7 @@ static void immd_abort_sync_ok(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *node_info)
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -553,12 +543,7 @@ static void immd_prto_purge_mutations(IMMD_CB *cb,
sync_evt.info.immnd.info.ctrl.nodeId = node_info->immnd_key;
sync_evt.info.immnd.info.ctrl.rulingEpoch = cb->mRulingEpoch;
sync_evt.info.immnd.info.ctrl.fevsMsgStart = cb->fevsSendCount;
-   sync_evt.info.immnd.info.ctrl.canBeCoord =
-   (node_info->isOnController) ? 1 : (cb->mScAbsenceAllowed) ? 4 : 0;
-   sync_evt.info.immnd.info.ctrl.ndExecPid =
-   (sync_evt.info.immnd.info.ctrl.canBeCoord == 4)
-   ? (cb->mScAbsenceAllowed)
-   : node_info->immnd_execPid;
+   set_canBeCoord_and_execPid(_evt, cb, node_info);
sync_evt.info.immnd.info.ctrl.isCoord = node_info->isCoord;
sync_evt.info.immnd.info.ctrl.syncStarted = node_info->syncStarted;
sync_evt.info.immnd.info.ctrl.nodeEpoch = node_info->epoch;
@@ -590,12 +575,7 @@ static int immd_dump_ok(IMMD_CB *cb, SaUint32T 

[devel] [PATCH 1/2] imm: reboot nodes used to be different partition with coord [#2936]

2020-07-03 Thread thuan.tran
- immnd send re-introduce refresh=3 with ex-immd (active) node id.
- immd set very high priority for re-introduce msg of local immnd
and choose coord if re-introduce refresh=3 from local immnd.
- immd reply re-intro to reboot if ex-immd is not same as ex-immd
of selected coord.
- immd use new INTRO_RSP_2 to checkpoint ex-immd to standby.
- immnd use MDS_RED_SUBSCRIBE for immd to know active/standby immd
and help detect headless in multi partition clusters rejoin.
- immnd discard FEVS from unknown immd or during re-introduce to
avoid immnd OUT OF ORDER restart and lost ex-immd info.
- Update README.SC_ABSENCE for this new feature.
- Allow to configure disable/enable this new feature.
- immd standby will reboot if see two actives immd to avoid sync
with wrong active.
---
 scripts/opensaf_reboot |  1 +
 src/imm/README.SC_ABSENCE  | 21 ++
 src/imm/common/immsv_evt.c | 17 +++-
 src/imm/common/immsv_evt.h |  4 ++
 src/imm/immd/immd.conf |  7 
 src/imm/immd/immd_cb.h |  3 ++
 src/imm/immd/immd_evt.c| 86 --
 src/imm/immd/immd_main.c   |  9 
 src/imm/immd/immd_mbcsv.c  | 24 +--
 src/imm/immd/immd_mds.c| 17 +---
 src/imm/immd/immd_proc.c   | 15 ---
 src/imm/immd/immd_red.h|  1 +
 src/imm/immd/immd_sbevt.c  |  9 +++-
 src/imm/immnd/immnd_cb.h   |  3 ++
 src/imm/immnd/immnd_evt.c  | 84 +
 src/imm/immnd/immnd_main.c |  2 +
 src/imm/immnd/immnd_mds.c  | 35 
 src/imm/immnd/immnd_proc.c | 19 -
 18 files changed, 290 insertions(+), 67 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index bcbc689f0..bb3cee5a1 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -143,6 +143,7 @@ unset tipc
 # argument 3 is set to 1, "safe reboot" request.
 if [ "$#" = 0 ]; then
$icmd pkill -STOP osafamfd
+   $icmd pkill -STOP osafimmd
quick_local_node_reboot
 elif [ "$safe_reboot" = 1 ]; then
opensaf_safe_reboot
diff --git a/src/imm/README.SC_ABSENCE b/src/imm/README.SC_ABSENCE
index 9cae5d519..644cbb546 100644
--- a/src/imm/README.SC_ABSENCE
+++ b/src/imm/README.SC_ABSENCE
@@ -76,3 +76,24 @@ Support for absent IMMD is incompatible with 2PBE. If both 
are configured then
 2PBE will win and the absence of IMMD feature will be ignored. An error message
 is printed in this case to the syslog at startup.
 
+
+SC ABSENCE and ROAMING SC
+=
+Under SC absence enable and Roaming SC cluster, multiple partitioned clusters
+can occur due to network split. If PBE database is configured on local node
+then many diverted IMM databases can occur. If rejoin these clusters into one
+cluster, any undefined behavior may happen. To avoid this, IMM implements
+mechanism to reboot nodes used to be on different partition with selected
+coordinator [ticket #2936]
+
+- IMMND send re-introduce use refresh id 3 with ex-immd node id.
+- When payload become controller, the IMMD will select coordinator
+(prioritize local IMMND) and send reply to reboot nodes which have ex-immd
+node id different with ex-immd of selected coordinator.
+- Active IMMD use new IMMD_A2S_MSG_INTRO_RSP_2 to checkpoint node info with
+ex-immd to standby IMMD.
+- IMMND use MDS_RED_SUBSCRIBE to know Active/Standby. Discard FEVS from
+unknown IMMD or during waiting accept of re-introduce to avoid IMMND restart
+due to OUT OR ORDER. This also detect headless in multi partitions rejoin.
+
+To enable this mechanism, please export IMMSV_COORD_SELECT_NODE=1 in immd.conf
diff --git a/src/imm/common/immsv_evt.c b/src/imm/common/immsv_evt.c
index c93f82a0f..1c43ec719 100644
--- a/src/imm/common/immsv_evt.c
+++ b/src/imm/common/immsv_evt.c
@@ -3395,7 +3395,7 @@ static uint32_t immsv_evt_enc_toplevel(IMMSV_EVT *i_evt, 
NCS_UBAID *o_ub)
 * sublevel */
}
 
-   if ((immdevt->info.ctrl_msg.refresh == 2) &&
+   if ((immdevt->info.ctrl_msg.refresh >= 2) &&
(immdevt->type ==
 IMMD_EVT_ND2D_INTRO)) { /* Intro after IMMD
restart. */
@@ -3419,6 +3419,12 @@ static uint32_t immsv_evt_enc_toplevel(IMMSV_EVT *i_evt, 
NCS_UBAID *o_ub)
ncs_encode_32bit(
, immdevt->info.ctrl_msg.impl_count);
ncs_enc_claim_space(o_ub, 4);
+   if (immdevt->info.ctrl_msg.refresh == 3) {
+   IMMSV_RSRV_SPACE_ASSERT(p8, o_ub, 4);
+   ncs_encode_32bit(
+   , 
immdevt->info.ctrl_msg.ex_immd_node_id);
+   ncs_enc_claim_space(o_ub, 4);
+   }
}
 
break;
@@ 

[devel] [PATCH 1/1] amf: enhance to work in roaming SC and headless [#3185]

2020-07-03 Thread thuan.tran
- amfd reset msg id counter for node that ignore amfnd down
event to avoid nodes reboot once more due to mismatch msg id after
reboot up from reboot order for sending node_up after sync window.

- amfd active order reboot its standby if it detect another
active amfd (multi partition cluster rejoin).

- amfd standby should reboot itself if see two active peers to
avoid standby do cold-sync or be updated with wrong active.

- amfd just become standby (out of sync) but see active down
should reboot itself.
---
 src/amf/amfd/dmsg.cc   |  8 
 src/amf/amfd/evt.h |  1 +
 src/amf/amfd/main.cc   |  3 +++
 src/amf/amfd/mds.cc| 36 ++--
 src/amf/amfd/msg.h |  1 +
 src/amf/amfd/ndfsm.cc  |  2 ++
 src/amf/amfd/proc.h|  1 +
 src/amf/amfd/role.cc   | 27 +++
 src/amf/amfd/util.cc   |  2 +-
 src/amf/amfnd/amfnd.cc |  2 +-
 10 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/dmsg.cc b/src/amf/amfd/dmsg.cc
index cf4019d8a..5273f358c 100644
--- a/src/amf/amfd/dmsg.cc
+++ b/src/amf/amfd/dmsg.cc
@@ -75,6 +75,8 @@ void avd_mds_d_enc(MDS_CALLBACK_ENC_INFO *enc_info) {
   ncs_encode_32bit(, msg->msg_info.d2d_chg_role_rsp.role);
   ncs_encode_32bit(, msg->msg_info.d2d_chg_role_rsp.status);
   break;
+case AVD_D2D_ROAMING_SC_SPLITBRAIN:
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, msg->msg_type);
   break;
@@ -120,6 +122,8 @@ void avd_mds_d_dec(MDS_CALLBACK_DEC_INFO *dec_info) {
   static_cast(ncs_decode_32bit());
   d2d_msg->msg_info.d2d_chg_role_rsp.status = ncs_decode_32bit();
   break;
+case AVD_D2D_ROAMING_SC_SPLITBRAIN:
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, d2d_msg->msg_type);
   break;
@@ -210,6 +214,10 @@ uint32_t avd_d2d_msg_rcv(AVD_D2D_MSG *rcv_msg) {
 osafassert(0);
   }
   break;
+case AVD_D2D_ROAMING_SC_SPLITBRAIN:
+  LOG_ER("Reboot order from Active as roaming SC split-brain detected");
+  opensaf_quick_reboot("Split-brain detected");
+  break;
 default:
   LOG_ER("%s: unknown msg %u", __FUNCTION__, rcv_msg->msg_type);
   break;
diff --git a/src/amf/amfd/evt.h b/src/amf/amfd/evt.h
index a9028cde3..a08dccebb 100644
--- a/src/amf/amfd/evt.h
+++ b/src/amf/amfd/evt.h
@@ -72,6 +72,7 @@ typedef enum avd_evt_type {
   AVD_IMM_REINITIALIZED,
   AVD_EVT_UNASSIGN_SI_DEP_STATE,
   AVD_EVT_ND_MDS_VER_INFO,
+  AVD_EVT_ROAMING_SC_SPLITBRAIN,
   AVD_EVT_MAX
 } AVD_EVT_TYPE;
 
diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 3b1536721..3cc0d9741 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -132,6 +132,9 @@ static const AVD_EVT_HDLR g_actv_list[AVD_EVT_MAX] = {
 invalid_evh,/* AVD_EVT_INVALID */
 avd_sidep_unassign_evh, /* AVD_EVT_UNASSIGN_SI_DEP_STATE */
 avd_avnd_mds_info_evh,  /* AVD_EVT_ND_MDS_VER_INFO*/
+
+/* Roaming SC split-brain processing */
+avd_roaming_sc_split_brain_evh, /* AVD_EVT_ROAMING_SC_SPLITBRAIN */
 };
 
 /* list of all the function pointers related to handling the events
diff --git a/src/amf/amfd/mds.cc b/src/amf/amfd/mds.cc
index 108f9b8bd..625616f5a 100644
--- a/src/amf/amfd/mds.cc
+++ b/src/amf/amfd/mds.cc
@@ -396,13 +396,38 @@ static uint32_t 
avd_mds_svc_evt(MDS_CALLBACK_SVC_EVENT_INFO *evt_info) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   AVD_CL_CB *cb = avd_cb;
 
+  if ((evt_info->i_svc_id == NCSMDS_SVC_ID_AVD) &&
+  (evt_info->i_change == NCSMDS_RED_UP) &&
+  (evt_info->i_role == V_DEST_RL_ACTIVE) &&
+  (cb->node_id_avd != evt_info->i_node_id) &&
+  (cb->other_avd_adest) &&
+  (cb->node_id_avd_other != evt_info->i_node_id)) {
+if (cb->avail_state_avd == SA_AMF_HA_STANDBY) {
+  LOG_ER("Standby peer see two peers: %x and %x",
+cb->node_id_avd_other, evt_info->i_node_id);
+  opensaf_reboot(0, NULL, "Standby peer see two peers");
+} else if (cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
+  // Send reboot order to known standby (multi clusters rejoin)
+  AVD_EVT *evt = new AVD_EVT();
+  evt->rcv_evt = AVD_EVT_ROAMING_SC_SPLITBRAIN;
+  if (m_NCS_IPC_SEND(>avd_mbx, evt, NCS_IPC_PRIORITY_HIGH) !=
+  NCSCC_RC_SUCCESS) {
+LOG_ER("%s: ncs_ipc_send failed", __FUNCTION__);
+delete evt;
+  }
+}
+return rc;
+  }
+
   switch (evt_info->i_change) {
 case NCSMDS_UP:
   switch (evt_info->i_svc_id) {
 case NCSMDS_SVC_ID_AVD:
+  TRACE("NCSMDS_UP AVD %x", evt_info->i_node_id);
   /* if((Is this up from other node) && (Is this Up from an Adest)) */
   if ((evt_info->i_node_id != cb->node_id_avd) &&
-  (m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest))) {
+  (m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest)) &&
+  (cb->other_avd_adest == 0)) {
 cb->node_id_avd_other = evt_info->i_node_id;
 cb->other_avd_adest = 

[devel] [PATCH 0/1] Review Request for amf: enhance to work in roaming SC and headless [#3185]

2020-07-03 Thread thuan.tran
Summary: amf: enhance to work in roaming SC and headless [#3185]
Review request for Ticket(s): 3185
Peer Reviewer(s): Minh, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3185
Base revision: d28ee50720d5e57edba6ee5c27e8b2bebb0638fa
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision fc5467b244137827df8fc54573597b75610be768
Author: thuan.tran 
Date:   Fri, 3 Jul 2020 16:15:54 +0700

amf: enhance to work in roaming SC and headless [#3185]

- amfd reset msg id counter for node that ignore amfnd down
event to avoid nodes reboot once more due to mismatch msg id after
reboot up from reboot order for sending node_up after sync window.

- amfd active order reboot its standby if it detect another
active amfd (multi partition cluster rejoin).

- amfd standby should reboot itself if see two active peers to
avoid standby do cold-sync or be updated with wrong active.

- amfd just become standby (out of sync) but see active down
should reboot itself.



Complete diffstat:
--
 src/amf/amfd/dmsg.cc   |  8 
 src/amf/amfd/evt.h |  1 +
 src/amf/amfd/main.cc   |  3 +++
 src/amf/amfd/mds.cc| 36 ++--
 src/amf/amfd/msg.h |  1 +
 src/amf/amfd/ndfsm.cc  |  2 ++
 src/amf/amfd/proc.h|  1 +
 src/amf/amfd/role.cc   | 27 +++
 src/amf/amfd/util.cc   |  2 +-
 src/amf/amfnd/amfnd.cc |  2 +-
 10 files changed, 79 insertions(+), 4 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] dtm: bind configured node ip for socket to setup new connection [#3192]

2020-05-26 Thread thuan.tran
---
 src/dtm/dtmnd/dtm_node_sockets.cc | 28 
 1 file changed, 28 insertions(+)

diff --git a/src/dtm/dtmnd/dtm_node_sockets.cc 
b/src/dtm/dtmnd/dtm_node_sockets.cc
index 41e16206b..7cb461810 100644
--- a/src/dtm/dtmnd/dtm_node_sockets.cc
+++ b/src/dtm/dtmnd/dtm_node_sockets.cc
@@ -241,6 +241,10 @@ int comm_socket_setup_new(DTM_INTERNODE_CB *dtms_cb,
   struct addrinfo addr_criteria, *p; /* Criteria for address match */
   char foreign_address_eth[INET6_ADDRSTRLEN + IFNAMSIZ];
   int flag;
+  struct in_addr addr_ipv4;
+  struct sockaddr_in sockaddr;
+  struct in6_addr addr_ipv6;
+  struct sockaddr_in6 sockaddr6;
   TRACE_ENTER();
 
   /* Construct the serv address structure */
@@ -330,6 +334,30 @@ int comm_socket_setup_new(DTM_INTERNODE_CB *dtms_cb,
 goto done;
   }
 
+  if (dtms_cb->i_addr_family == AF_INET) {
+if (inet_pton(AF_INET, dtms_cb->ip_addr.c_str(), _ipv4) == 1) {
+  sockaddr.sin_family = AF_INET;
+  sockaddr.sin_port = 0;
+  sockaddr.sin_addr = addr_ipv4;
+  if (osaf_bind(sock_desc, (struct sockaddr *),
+sizeof(sockaddr)) != 0)
+LOG_WA("DTM:osaf_bind() ipv4 failed with errno %d", errno);
+} else {
+  LOG_WA("DTM:inet_pton(%s) ipv4 failed", dtms_cb->ip_addr.c_str());
+}
+  } else {
+if (inet_pton(AF_INET6, dtms_cb->ip_addr.c_str(), _ipv6) == 1) {
+  sockaddr6.sin6_family = AF_INET6;
+  sockaddr6.sin6_port = 0;
+  sockaddr6.sin6_addr = addr_ipv6;
+  if (osaf_bind(sock_desc, (struct sockaddr *),
+sizeof(sockaddr6)) != 0)
+LOG_WA("DTM:osaf_bind() ipv6 failed with errno %d", errno);
+} else {
+  LOG_WA("DTM:inet_pton(%s) ipv6 failed", dtms_cb->ip_addr.c_str());
+}
+  }
+
   /* Try to connect to the given port */
   if (connect(sock_desc, addr_list->ai_addr, addr_list->ai_addrlen) < 0) {
 err = errno;
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for dtm: bind configured node ip for socket to setup new connection [#3192]

2020-05-26 Thread thuan.tran
Summary: dtm: bind configured node ip for socket to setup new connection [#3192]
Review request for Ticket(s): 3192
Peer Reviewer(s): Minh, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3192
Base revision: 54d565e262c5a4843931d8d3ceaa7c7d5895f946
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision c95492c04ae67500af33db6fbcd46b687e50161e
Author: thuan.tran 
Date:   Tue, 26 May 2020 09:55:18 +0700

dtm: bind configured node ip for socket to setup new connection [#3192]



Complete diffstat:
--
 src/dtm/dtmnd/dtm_node_sockets.cc | 28 
 1 file changed, 28 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
N/A

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] rde: avoid dual active controllers in relax promotion mode [#3188]

2020-05-25 Thread thuan.tran
- Node already give up promotion has set role to QUIESCED should not
promote active anyway, it will cause dual active controllers.
- Node fail promote active with consensus with error exist should
set role as QUIESCED if current role is UNDEFINED.
---
 src/rde/rded/role.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 06c346ced..208ae2364 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -107,9 +107,14 @@ void Role::PromoteNode(const uint64_t cluster_size,
   rc = consensus_service.PromoteThisNode(true, cluster_size);
   if (rc == SA_AIS_ERR_EXIST) {
 LOG_WA("Another controller is already active");
+if (role() == PCS_RDA_UNDEFINED) SetRole(PCS_RDA_QUIESCED);
 return;
   } else if (rc != SA_AIS_OK && relaxed_mode == true) {
 LOG_WA("Unable to set active controller in consensus service");
+if (role() == PCS_RDA_QUIESCED) {
+  LOG_WA("Another controller is already promoted");
+  return;
+}
 LOG_WA("Will become active anyway");
 promotion_pending = true;
   } else if (rc != SA_AIS_OK) {
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for rde: avoid dual active controllers in relax promotion mode [#3188] V6

2020-05-25 Thread thuan.tran
Summary: rde: avoid dual active controllers in relax promotion mode [#3188]
Review request for Ticket(s): 3188
Peer Reviewer(s): Thang, Minh, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3188
Base revision: 54d565e262c5a4843931d8d3ceaa7c7d5895f946
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision 9ad1ea84aa8f9959458acb7af5ed7d8e26b14ada
Author: thuan.tran 
Date:   Mon, 25 May 2020 16:20:42 +0700

rde: avoid dual active controllers in relax promotion mode [#3188]

- Node already give up promotion has set role to QUIESCED should not
promote active anyway, it will cause dual active controllers.
- Node fail promote active with consensus with error exist should
set role as QUIESCED if current role is UNDEFINED.



Complete diffstat:
--
 src/rde/rded/role.cc | 5 +
 1 file changed, 5 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers or 1 week no comment

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for rde: avoid dual active controllers in relax promotion mode [#3188] V5

2020-05-24 Thread thuan.tran
Summary: rde: avoid dual active controllers in relax promotion mode [#3188]
Review request for Ticket(s): 3188
Peer Reviewer(s): Thang, Minh, Gary
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3188
Base revision: 54d565e262c5a4843931d8d3ceaa7c7d5895f946
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision acf86f2c686b9316e6a17c5a2e6800ffe676db4e
Author: thuan.tran 
Date:   Mon, 25 May 2020 11:27:24 +0700

rde: avoid dual active controllers in relax promotion mode [#3188]

Node already give up promotion has set role to QUIESCED should not
promote active anyway, it will cause dual active controllers.



Complete diffstat:
--
 src/rde/rded/role.cc | 4 
 1 file changed, 4 insertions(+)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers or 1 week no comment

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


  1   2   3   4   >