[tickets] [opensaf:tickets] #2225 imm: Remove use of SaBoolT
--- ** [tickets:#2225] imm: Remove use of SaBoolT** **Status:** accepted **Milestone:** 5.2.FC **Created:** Mon Dec 12, 2016 07:43 AM UTC by Hung Nguyen **Last Updated:** Mon Dec 12, 2016 07:43 AM UTC **Owner:** Hung Nguyen SaBoolT should be replaced with bool wherever possible. Example: ~~~ SaBoolT freeMemory = SA_FALSE; ... if(freeMemory == SA_TRUE) { free(objectNameStr); } ~~~ ~~~ SaBoolT immnd_syncComplete(IMMND_CB *cb, SaBoolT coordinator, SaUint32T step); ~~~ Use of SaBoolT should be kept in API and message types. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2224 imm: Improve the iteration in ImmModel
--- ** [tickets:#2224] imm: Improve the iteration in ImmModel** **Status:** accepted **Milestone:** 5.2.FC **Created:** Mon Dec 12, 2016 07:09 AM UTC by Hung Nguyen **Last Updated:** Mon Dec 12, 2016 07:09 AM UTC **Owner:** Hung Nguyen After removing an element from a map, the iterator is reset to begin. ~~~ for(ci2=sAdmReqContinuationMap.begin(); ci2!=sAdmReqContinuationMap.end();) { if(ci2->second.mConn == dead) { TRACE_5("Discarding Adm Req continuation %llu", ci2->first); sAdmReqContinuationMap.erase(ci2); ci2=sAdmReqContinuationMap.begin(); } else { ++ci2;} } ~~~ With C++11, erase(const_iterator) returns an iterator to the next element. We can avoid resetting the iterator. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd
Hi Vu, I raised the concern for termination in a separate thread because IMCN is blocked in one of the CCB callbacks while sending notification NotificationSend() API. This is happening because of circular depedency as standby NTFS is blocked. We need to remember that when IMCN is blocked, the test case is still continuing and making configuration changes and because of this IMCN is accumulating those callbacks (IMM is not blocked and sending those callbacks to IMCN). These callbacks plus already in the list will be processed once IMCN gets unlocked. Now moving termination in a separate thread will surely allow standby NTFS to proceed and will unblock IMCN, but IMCN component after getting unblocked will still be executing those remaining callbacks. Earliar NTFS used to abort IMCN when terminates gets timed out. So we need gaurantee that thread will surely terminated IMCN. One more thing, Will the temination by NTFS in a separate thread not lead to accumulation of terminations itself? Suppose NTFS started the thread for termination and termination thread has still not exited and another switchver happens. In this way one more thread will get spawned for termination. If both these problems are resolved and gauranteed, I do not see any problem with V3 because my gues is the problem of notifcation being lost has been always existed and it will remain so irrespective of v1 or v3 (refined version) approach. The reasons are: 1) NTFS responds to AMF in csi set callback before termination of IMCN component. Because of this during switchover, a time comes when active NTFS has just responded for active assignments and is terminating standby IMCN process. While this is happening quiesced NTFS has got the standby CSI callback and it also has started termination of active IMCN process. Since there is no syncronization between these two terminations though on different nodes, configuration changes may get missed and hence the notification. 2) IMCN is not role aware. So NTFS currently terminates IMCN when it gets standby role andMCN exits without any confirmation that it has cleared its notification list and CCB callback list. Neither patch v1 not v3 handles this. Note: These is a ticket fot these two (#157 NTF: Improve HA handling in IMCN). Thanks, Praveen --- ** [tickets:#2219] ntfd: circular dependency with osafntfimcnd** **Status:** assigned **Milestone:** 5.0.2 **Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee **Last Updated:** Mon Dec 12, 2016 03:03 AM UTC **Owner:** Praveen A circular dependency can be seen when performing a si-swap of safSi=SC-2N,safApp=OpenSAF: 1. Active NTFD is trying to sync with Standby using MBC 2. Standby NTFD is the process of terminating its local osafntfimcnd. It is stuck in timedwait_imcn_exit() and cannot reply to the Active. 3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD So we have (1) depending on (2) depending on (3) depending on (1) This results in a temporary deadlock that dramatically slows down NTFD's ability to process its main dispatch loop. The deadlock only lasts for approx. 1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating a coredump. The si-swap operation will also timeout. steps to reproduce - Run loop of ntftest 32 root@SC-1:~# for i in {1..10}; do ntftest 32; done - On another terminal, keep swapping 2N Opensaf SI, got coredump after couples of swaps root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF ... root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF ~~~ SC-2 (active) There are a lot of send failures. Each taking approx. 1 second to timeout. During these 1 second timeouts, NTFD cannot process the main dispatch loop. Dec 7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7
[tickets] [opensaf:tickets] #2209 SMF: ONE-STEP upgrade failed due to duplicated entities in AU/DU
- **status**: review --> fixed - **Comment**: changeset: 8429:91b55c7c9848 branch: opensaf-5.0.x parent: 8426:e44bf1a904b6 user:Neelakanta Reddydate:Mon Dec 12 11:06:08 2016 +0530 summary: smf:Allow optimization at node level forAddRemove in mergeStepIntoSingle[#2209] changeset: 8430:3f26fac74227 branch: opensaf-5.1.x parent: 8427:bff64f77344b user:Neelakanta Reddy date:Mon Dec 12 11:06:08 2016 +0530 summary: smf:Allow optimization at node level forAddRemove in mergeStepIntoSingle[#2209] changeset: 8431:22a441efda14 tag: tip parent: 8428:140770d51110 user:Neelakanta Reddy date:Mon Dec 12 11:06:08 2016 +0530 summary: smf:Allow optimization at node level forAddRemove in mergeStepIntoSingle[#2209] --- ** [tickets:#2209] SMF: ONE-STEP upgrade failed due to duplicated entities in AU/DU** **Status:** fixed **Milestone:** 5.1.1 **Created:** Mon Nov 28, 2016 07:01 AM UTC by Tai Dinh **Last Updated:** Fri Dec 02, 2016 12:31 PM UTC **Owner:** Neelakanta Reddy **Attachments:** - [one_step_upgrade_fix.patch](https://sourceforge.net/p/opensaf/tickets/2209/attachment/one_step_upgrade_fix.patch) (3.0 kB; application/octet-stream) Execution of ONE-STEP upgrade will fail if the original campaign contains forAddRemove Single Step procedure that have duplicated entities with another procedure. Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO STEP: Lock deactivation units Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO createNodeGroup: saImmOmCcbApply() Fail 'SA_AIS_ERR_FAILED_OPERATION (21)' Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO changeNodeGroupAdminState: createNodeGroup() Fail SA_AIS_ERR_FAILED_OPERATION (21) Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO lock: changeNodeGroupAdminState() Fail SA_AIS_ERR_FAILED_OPERATION (21) Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Failed to Lock deactivation units in step=safSmfStep=0001 Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step execution failed, Try undoing the step Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO SmfStepStateUndoing::execute start undoing step. Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Rollback of cluster reboot activate step is not implemented Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step undoing failed Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO Step safSmfStep=0001 in procedure safSmfProc=SmfSSMergedProc failed, step result 5 Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO CAMP: Procedure safSmfProc=SmfSSMergedProc returned FAILED Nov 26 18:30:11 SC-2-2 osafsmfd[4929]: ER Failed to rollback campaign, wrong state 10 The reason of this is because during calculating/optimizing the AU/DU of the merged procedure, the original AU/DU of that single step procedure is always appended into the result procedure without checking for duplicated entities. This need to be fixed by removing any duplicated entities that is already presented in the tmpDU before optimization. See attachment for a proposed fix. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1956 IMM: AugmentCcbInitialize crashed when called inside completed callback
- **status**: review --> fixed - **Comment**: changeset: 8426:e44bf1a904b6 branch: opensaf-5.0.x parent: 8423:a85ab2a8baa4 user:Neelakanta Reddydate:Mon Dec 12 10:27:44 2016 +0530 summary: imm:allow augumentCcbInit with ROF as false in completed callback[#1956] changeset: 8427:bff64f77344b branch: opensaf-5.1.x parent: 8424:962b79041a18 user:Neelakanta Reddy date:Mon Dec 12 10:27:44 2016 +0530 summary: imm:allow augumentCcbInit with ROF as false in completed callback[#1956] changeset: 8428:140770d51110 tag: tip parent: 8425:cf977e804025 user:Neelakanta Reddy date:Mon Dec 12 10:27:44 2016 +0530 summary: imm:allow augumentCcbInit with ROF as false in completed callback[#1956] --- ** [tickets:#1956] IMM: AugmentCcbInitialize crashed when called inside completed callback** **Status:** fixed **Milestone:** 5.0.2 **Created:** Wed Aug 17, 2016 12:22 PM UTC by Chani Srivastava **Last Updated:** Fri Nov 18, 2016 02:55 PM UTC **Owner:** Neelakanta Reddy **Attachments:** - [AugInit.7z](https://sourceforge.net/p/opensaf/tickets/1956/attachment/AugInit.7z) (1.3 MB; application/octet-stream) Opensaf Version 5.0 immnd traces and coredump attached ###0 0x7fa056226b55 in raise () from /lib64/libc.so.6 ###1 0x7fa056228131 in abort () from /lib64/libc.so.6 ###2 0x7fa0559ac08e in getAdmoName () from /usr/lib64/libSaImmOi.so.0 ###3 0x7fa0559acb48 in saImmOiAugmentCcbInitialize () from /usr/lib64/libSaImmOi.so.0 ###4 0x7fa055fda86f in _wrap_saImmOiAugmentCcbInitialize () at saImmOiA211_wrap.c:5917 ###5 0x00418243 in PyObject_Call (func=0x4d8f, arg=0x4d8f, kw=0x6) at Objects/abstract.c:1860 ###6 0x00487437 in ext_do_call (nk=, na=, flags=, pp_stack=, func=) at Python/ceval.c:3846 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd
I attached the patch here for your ref. Attachments: - [ntfsv_imcn_coredump_v3.patch](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/327b0222/a42d/attachment/ntfsv_imcn_coredump_v3.patch) (5.5 kB; application/octet-stream) --- ** [tickets:#2219] ntfd: circular dependency with osafntfimcnd** **Status:** assigned **Milestone:** 5.0.2 **Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee **Last Updated:** Mon Dec 12, 2016 03:01 AM UTC **Owner:** Praveen A circular dependency can be seen when performing a si-swap of safSi=SC-2N,safApp=OpenSAF: 1. Active NTFD is trying to sync with Standby using MBC 2. Standby NTFD is the process of terminating its local osafntfimcnd. It is stuck in timedwait_imcn_exit() and cannot reply to the Active. 3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD So we have (1) depending on (2) depending on (3) depending on (1) This results in a temporary deadlock that dramatically slows down NTFD's ability to process its main dispatch loop. The deadlock only lasts for approx. 1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating a coredump. The si-swap operation will also timeout. steps to reproduce - Run loop of ntftest 32 root@SC-1:~# for i in {1..10}; do ntftest 32; done - On another terminal, keep swapping 2N Opensaf SI, got coredump after couples of swaps root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF ... root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF ~~~ SC-2 (active) There are a lot of send failures. Each taking approx. 1 second to timeout. During these 1 second timeouts, NTFD cannot process the main dispatch loop. Dec 7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e ~~~ ~~~ SC-1 (standby) NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to NTFD on SC-1. Dec 7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR handle_state_ntfimcn: Terminating osafntfimcnd process Dec 7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR Termination timeout Dec 7 11:01:45.474375 osafntfd [464:ntfs_imcnutil.c:0130] << wait_imcnproc_termination: rc = -1, retry_cnt = 101 Dec 7 11:01:45.474387 osafntfd [464:ntfs_imcnutil.c:0168] TR Normal termination failed. Escalate to abort Dec 7 11:01:45.574703 osafntfd [464:ntfs_imcnutil.c:0172] TR Imcn successfully aborted Dec 7 11:01:45.574712 osafntfd [464:ntfs_imcnutil.c:0187] << timedwait_imcn_exit ~~~