[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.
- **status**: review --> fixed - **assigned_to**: Praveen --> nobody - **Comment**: develop: commit 126c7d9c59a41205ce16c2c9e8a7cae7457a0c2c Author: Praveen <praveen.malv...@oracle.com> Date: Tue Sep 12 17:08:11 2017 +0530 amfnd: fix opensaf shutdown and active monitoring failure [#2493] commit 74476b88a30c80c788e56b6ede2baea040e22c18 Author: Praveen <praveen.malv...@oracle.com> Date: Tue Sep 12 17:08:11 2017 +0530 amfnd: fix opensaf shutdown and active monitoring failure [#2493] --- ** [tickets:#2493] amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.** **Status:** fixed **Milestone:** 5.17.10 **Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen **Last Updated:** Wed Aug 30, 2017 10:39 AM UTC **Owner:** nobody **Attachments:** - [1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml) (12.0 kB; text/xml) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd) (6.1 MB; application/octet-stream) - [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) (275.6 kB; application/octet-stream) steps to reproduce: 1)Bring one controller up. 2)Add attached configuration in the system. 3)Unlock-in and unlock su1. Attached configuration uses amfpm command to start active monitoring. If this command is wrongly configured by the user, AMF reports fault on the component and AMFND restarts it. Since everytime active monitoring command fails, component is getting continuously faulted. As a last option when OpenSAF is stopped on the node, AMFND asserted: syslog: Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATING Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: avnd_su_pres_st_chng_prc: Assertion 'si' failed. Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate this process bt: \#0 0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7f662fbec0d8 in __GI_abort () at abort.c:89 \#2 0x7f66306dedbe in __osafassert_fail (__file=, __line=, __func=, __assertion=) at src/base/sysf_def.c:286 \#3 0x7f66313fff3f in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_TERMINATING, prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1886 \#4 avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0, ev=) at src/amf/amfnd/susm.cc:1610 \#5 0x7f66313caf58 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at src/amf/amfnd/clc.cc:1501 \#6 0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892 \#7 0x7f66314067e8 in avnd_comp_cleanup_launch (comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178 \#8 0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/term.cc:76 \#9 0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 <_avnd_cb>, mid=) at src/amf/amfnd/di.cc:1264 \#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, evt=0x7f6628001010) at src/amf/amfnd/di.cc:411 \#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at src/amf/amfnd/main.cc:658 \#12 avnd_main_process () at src/amf/amfnd/main.cc:610 \#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at src/amf/amfnd/main.cc:203 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.
- **status**: review --> fixed - **Comment**: commit c76c419a0250ac61e0d48180950aaafb639f32bf Author: Praveen <praveen.malv...@oracle.com> Date: Thu Aug 31 10:56:36 2017 +0530 amfd: honor PrefAssignedSU in nway and nway active model during assignments [#2269] SG attribute saAmfSGNumPrefAssignedSUs is applicable to N-Way and N-Way Active model. AMF is assigning more than saAmfSGNumPrefAssignedSUs in both N-Way and N-Way Active model. Patch fixes this problem. --- ** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.** **Status:** fixed **Milestone:** 5.17.10 **Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen **Last Updated:** Fri Jul 28, 2017 08:25 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in N-Way Active model. Issue can be reproduced by brining up the attached configurration. In the application saAmfSGNumPrefAssignedSUs is set to 2: immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass saAmfSGNumPrefAssignedSUs SA_UINT32_T 2 (0x2) But AMF is giving assignmets to all the three SUs: safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) Since this attribute is valid for N-Way model also, issue is applicable to N-Way model also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.
- **status**: assigned --> accepted - **Blocker**: False --> True --- ** [tickets:#2493] amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.** **Status:** accepted **Milestone:** 5.17.10 **Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen **Last Updated:** Fri Jul 28, 2017 08:23 AM UTC **Owner:** Praveen **Attachments:** - [1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml) (12.0 kB; text/xml) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd) (6.1 MB; application/octet-stream) - [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) (275.6 kB; application/octet-stream) steps to reproduce: 1)Bring one controller up. 2)Add attached configuration in the system. 3)Unlock-in and unlock su1. Attached configuration uses amfpm command to start active monitoring. If this command is wrongly configured by the user, AMF reports fault on the component and AMFND restarts it. Since everytime active monitoring command fails, component is getting continuously faulted. As a last option when OpenSAF is stopped on the node, AMFND asserted: syslog: Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATING Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: avnd_su_pres_st_chng_prc: Assertion 'si' failed. Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate this process bt: \#0 0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7f662fbec0d8 in __GI_abort () at abort.c:89 \#2 0x7f66306dedbe in __osafassert_fail (__file=, __line=, __func=, __assertion=) at src/base/sysf_def.c:286 \#3 0x7f66313fff3f in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_TERMINATING, prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1886 \#4 avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0, ev=) at src/amf/amfnd/susm.cc:1610 \#5 0x7f66313caf58 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at src/amf/amfnd/clc.cc:1501 \#6 0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892 \#7 0x7f66314067e8 in avnd_comp_cleanup_launch (comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178 \#8 0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/term.cc:76 \#9 0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 <_avnd_cb>, mid=) at src/amf/amfnd/di.cc:1264 \#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, evt=0x7f6628001010) at src/amf/amfnd/di.cc:411 \#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at src/amf/amfnd/main.cc:658 \#12 avnd_main_process () at src/amf/amfnd/main.cc:610 \#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at src/amf/amfnd/main.cc:203 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.
- **status**: review --> fixed - **Comment**: commit 00c185144de728f7938f775fd3ce65ee95b01032 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Aug 28 14:32:32 2017 +0530 amf: update readme for SC status change callback [#2475] commit b93cf244b3fb64bc213d82125e1665b50b80f2c6 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Aug 28 14:32:33 2017 +0530 amf: support SC status change callback, non SAF [#2475] commit 81e2878c1fa3287e37238a38a1bb054951489e86 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Aug 28 14:32:33 2017 +0530 amf: add sample apps for SC status change callback [#2475] commit a79bb4c527ec3c59a61ce6552184c18213fe4acd Author: Praveen <praveen.malv...@oracle.com> Date: Mon Aug 28 14:32:33 2017 +0530 amf: add api test cases for sc status change callback [#2475] --- ** [tickets:#2475] amf: support for SC status change Callback, non SAF.** **Status:** fixed **Milestone:** 5.17.10 **Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen **Last Updated:** Mon Aug 14, 2017 08:27 AM UTC **Owner:** Praveen This enhancement is for supporting two resources in AMFA which will enable application to know about SCs Absence and Presence state when they go down and comes up. Information about the resources: * A callback that will be invoked by AMFA whenever a SC joins cluster and both SCs leaves cluster if SC Absence feature is enabled. -Callback and its argument: void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state) where OsafAmfSCStatusT is defined as: typedef enum { OSAF_AMF_SC_PRESENT = 1, OSAF_AMF_SC_ABSENT = 2, } OsafAmfSCStatusT; This callback can be integrated with standard AMF component(even with legacy one also). -Return codes: SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly (callback). * An API to register/install above callback function: void osafAmfInstallSCStatusChangeCallback(SaAmfHandleT amfHandle, void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status)); If 0 is passed as amfHandle, then callback will be invoked in the context of MDS thread. If a valid amfHandle is passed then callback will be invoked in the context of thread which is calling saAmfDispatch() with this handle. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #466 Length of the objectnames is more by one for configuration object notifications
- **status**: assigned --> unassigned - **assigned_to**: Praveen --> nobody - **Blocker**: --> False --- ** [tickets:#466] Length of the objectnames is more by one for configuration object notifications** **Status:** unassigned **Milestone:** future **Created:** Thu Jun 20, 2013 09:08 AM UTC by Sirisha Alla **Last Updated:** Wed Jan 04, 2017 06:41 AM UTC **Owner:** nobody When ntfimcnd sends notifications for configuration object creation/modification/deletion, the length of the notifying object and the notification object is been shown wrongly. IMM callback gives the length of the notification object correctly. Notification object length in the imm callback: objectName->length: 37 objectName->value: 'attrName_testSA_registerSA_Node_37_69' Object create/modify/delete notifications indicate the length of notification object is 38 and the length of notifying object is 15 for "safApp=OpenSaf". This issue is reproducible. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up
- **status**: review --> fixed - **Comment**: develop: commit c523f2a1b2b887ac2c6a91e5ee12c028b243a729 Author: Praveen <praveen.malv...@oracle.com> Date: Tue Aug 8 15:12:19 2017 +0530 amf: add option for controller status in amfclusterstatus [#2536] release: commit 8d93a58adfdf96f2420e89686e0026375211799f Author: Praveen <praveen.malv...@oracle.com> Date: Tue Aug 8 15:12:19 2017 +0530 amf: add option for controller status in amfclusterstatus [#2536] --- ** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up** **Status:** fixed **Milestone:** 5.17.10 **Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell **Last Updated:** Fri Aug 04, 2017 09:19 AM UTC **Owner:** Praveen The current amfclusterstatus command can be used to check if the SCs are up, but in order to interpret the result you must know the names of the SC node(s), and you must parse the console output from the command. In order to make this command easier to use for this purpose, we could add an option, e.g. -s or --controller-status, which simply answers the question whether any SC is currently up. It could exit with exit code 0 if any SC is up, and with exit code 1 if no SC is up. We could also add a -q or --quiet option that suppresses all printouts form the command. This will make the command easy to use in a shell script. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up
- **status**: accepted --> review --- ** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up** **Status:** review **Milestone:** 5.17.10 **Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell **Last Updated:** Wed Aug 02, 2017 04:08 AM UTC **Owner:** Praveen The current amfclusterstatus command can be used to check if the SCs are up, but in order to interpret the result you must know the names of the SC node(s), and you must parse the console output from the command. In order to make this command easier to use for this purpose, we could add an option, e.g. -s or --controller-status, which simply answers the question whether any SC is currently up. It could exit with exit code 0 if any SC is up, and with exit code 1 if no SC is up. We could also add a -q or --quiet option that suppresses all printouts form the command. This will make the command easy to use in a shell script. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2429 clm: support for a clm utility to perform tracking and for getting node info.
- **status**: review --> fixed - **Comment**: commit 77346df31fa7061496b22f91611e120477e907b5 Author: Praveen <praveen.malv...@oracle.com> Date: Wed Aug 2 16:58:16 2017 +0530 clm: add clm tool for tracking and for getting node info [#2429] Add a utility/application which enables user to: -perform tracking using saClmClusterTrack_4(). -get node info by calling saClmClusterNodeGet_4(). -get node info asynchronously by calling saClmClusterNodeGetAsync(). --- ** [tickets:#2429] clm: support for a clm utility to perform tracking and for getting node info.** **Status:** fixed **Milestone:** 5.17.10 **Created:** Mon Apr 17, 2017 06:38 AM UTC by Praveen **Last Updated:** Fri Jul 14, 2017 09:04 AM UTC **Owner:** Praveen Ticket #2394 implements tool commands for handling CLM objects and performing admin operation. This ticket is to add a utility or application which enable user to: \-perform tracking using saClmClusterTrack_4(). \-get node info by calling saClmClusterNodeGet_4(). \-get node info asynchronously by calling saClmClusterNodeGetAsync(). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up
I will be publishing this patch by the end of this week. Ticket #2475 implements a cllback in the same area and is in review state. --- ** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up** **Status:** accepted **Milestone:** 5.17.10 **Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell **Last Updated:** Tue Aug 01, 2017 01:49 PM UTC **Owner:** Praveen The current amfclusterstatus command can be used to check if the SCs are up, but in order to interpret the result you must know the names of the SC node(s), and you must parse the console output from the command. In order to make this command easier to use for this purpose, we could add an option, e.g. -s or --controller-status, which simply answers the question whether any SC is currently up. It could exit with exit code 0 if any SC is up, and with exit code 1 if no SC is up. We could also add a -q or --quiet option that suppresses all printouts form the command. This will make the command easy to use in a shell script. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up
- **status**: unassigned --> accepted - **assigned_to**: Praveen - **Part**: - --> tools --- ** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up** **Status:** accepted **Milestone:** 5.17.10 **Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell **Last Updated:** Fri Jul 28, 2017 06:54 AM UTC **Owner:** Praveen The current amfclusterstatus command can be used to check if the SCs are up, but in order to interpret the result you must know the names of the SC node(s), and you must parse the console output from the command. In order to make this command easier to use for this purpose, we could add an option, e.g. -s or --controller-status, which simply answers the question whether any SC is currently up. It could exit with exit code 0 if any SC is up, and with exit code 1 if no SC is up. We could also add a -q or --quiet option that suppresses all printouts form the command. This will make the command easy to use in a shell script. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.
- **status**: assigned --> review - **Blocker**: --> True - **Milestone**: future --> 5.17.08 --- ** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.** **Status:** review **Milestone:** 5.17.08 **Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen **Last Updated:** Tue Mar 28, 2017 07:04 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in N-Way Active model. Issue can be reproduced by brining up the attached configurration. In the application saAmfSGNumPrefAssignedSUs is set to 2: immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass saAmfSGNumPrefAssignedSUs SA_UINT32_T 2 (0x2) But AMF is giving assignmets to all the three SUs: safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) Since this attribute is valid for N-Way model also, issue is applicable to N-Way model also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #70 AMF support for Container and contained components
Created branch ticket-70 and pushed some initial patches in repo: git://git.code.sf.net/u/praveenmalviya/review. --- ** [tickets:#70] AMF support for Container and contained components** **Status:** assigned **Milestone:** future **Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar **Last Updated:** Wed Jul 19, 2017 08:31 AM UTC **Owner:** Praveen Migrated from http://devel.opensaf.org/ticket/1436: Current implementation of AMF doesn't support Container and Contained components. Concept of container and contained component was introduced in B.03.01 spec. Because of this support, there were series of changes and new additions in different sections of the spec. Also a new chapeter 6 is fully dedicated to conatiner and contained components. What follows is summary of conatiner and contained components concept collected from different sections of the B.04.01 spec with reference to particular sections and page no. **A)Section 3.1.2.1.1 page 45 talks about use case related to containter** ** and contained component concept:** " The concept of container and contained components allows the Availability Management Framework to integrate components that are not executed directly by the operating system, but rather in a controlled environment running on top of the operating system. Widespread environments are runtime environments, virtual machines, or component frameworks. " AMF directly manages life cycle of container component but not of containted component. A container component cooperates with AMF for managing life cycle of contained component. If a container comp1 manages life cycle of a contained comp2 then comp1 is termed as assciated container component of comp2 and comp2 is termed as associated contained component of comp1. If there is one more component say comp3 for which associated container component is same comp1 then comp3 and comp2 are referred as collocated contained components. (3.1.2.1.1 Container and Contained Components page 45) **B)Configuration:** -Container and contained components are local SA-aware components. (6.1.2 Component Category page 221) -User can configure attribute "saAmfCtCompCategory" of class "SaAmfCompType" with following values to declare component of this CompType is a container or contained component (Section7.4.8 SaAmfCompCategoryT page 258): \ #define SA_AMF_COMP_CONTAINER 0x0010 \#define SA_AMF_COMP_CONTAINED 0x0020 -A single container component acts as container component for many contained conponents.(6.2.2 Assignment of the Container CSI page 224). -Container and its contained components must be hosted on same AMF Node. -A containter component can be part of SG of only N-Way Acitve model. (6.1.6 Redundancy Models Page 222) -A contained component can be part of SG of any redundancy model. (6.1.6 Redundancy Models Page 223) -A SU cannot contain any other types/categories of components if a container component is present in it. ( 3.1.4 Service Unit page 52). -A SU cannot contain both container components and contained components. (6.1.5 Container and Contained Components in Service Units and Service Groups page 221) -A SU that contains a contained component can only contain collocated contained components.(6.1.5 Container and Contained Components in Service Units and Service Groups page 221) -SUs containing contained components and SUs containing container components must belong to different SGs. (6.1.5 Container and Contained Components in Service Units and Service Groups page 222) -Since a container component can be associated with many contained components and also there can be many container components in a SU, a user has to specify a CSI name in saAmfCompContainerCsi in Comp class to declare indirectly its associated container component. The component which will receive this container CSI will act as container component for this contained component on same node.(3.1.3 Component Service Instance page 51) -A SI containing a container CSI cannot have any other CSI. (6.1.5 Container and Contained Components in Service Units and Service Groups page 222) -A container component can recieve multiple CSI assignments based on configuration. Among these CSIs, one or more can be for handling contained components and others can be for providing other services. (3.1.3 Component Service Instance page 51) -If a SU contains contained components then they should have a common associated container component. This can be ensured by configuring same container CSI.( 3.1.4 Service Unit page 52) -Rank of container SI should be higher than the rank of contained SI. (3.6.1.4 Considerations when Configuring Redundancy page 121) and (6.1.5 Container and Contained Components in Service Units and Service Groups Page 222) -There should not be any conflict while hosting container SUs and contained SUs on nodes and node group as both container and its associated contained components
[tickets] [opensaf:tickets] #2429 clm: support for a clm utility to perform tracking and for getting node info.
- **summary**: clm: support for a clm utility to perform tracking and cluster status. --> clm: support for a clm utility to perform tracking and for getting node info. - Description has changed: Diff: --- old +++ new @@ -4,4 +4,4 @@ \-perform tracking using saClmClusterTrack_4(). \-get node info by calling saClmClusterNodeGet_4(). \-get node info asynchronously by calling saClmClusterNodeGetAsync(). -\-to list nodes status when SCs are present and absent. + - **status**: accepted --> review - **Part**: - --> tools - **Blocker**: --> False --- ** [tickets:#2429] clm: support for a clm utility to perform tracking and for getting node info.** **Status:** review **Milestone:** 5.17.10 **Created:** Mon Apr 17, 2017 06:38 AM UTC by Praveen **Last Updated:** Sat Jul 01, 2017 04:15 PM UTC **Owner:** Praveen Ticket #2394 implements tool commands for handling CLM objects and performing admin operation. This ticket is to add a utility or application which enable user to: \-perform tracking using saClmClusterTrack_4(). \-get node info by calling saClmClusterNodeGet_4(). \-get node info asynchronously by calling saClmClusterNodeGetAsync(). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
commit 8b79e5a7d45986f50195865f6ec276eede025ae4 Author: Praveen <praveen.malv...@oracle.com> Date: Thu May 18 17:19:17 2017 +0530 clmd: update saClmNodeCurrAddress and saClmNodeCurrAddressFamily in IMM V2 [#2331] --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** fixed **Milestone:** 5.17.10 **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Fri Jul 07, 2017 10:51 AM UTC **Owner:** Praveen saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
- **status**: review --> fixed --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** fixed **Milestone:** 5.17.10 **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Sat Jul 01, 2017 04:15 PM UTC **Owner:** Praveen saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #70 AMF support for Container and contained components
ns regular SA-AWARE component or + a contained omponent. It is different when a SU/Node/SG contains an "active" container + component and its associated contained components are up. + Basic flows of some admin operation in suc a case: + -Lock of container SU: + -Since contained SU may belong to any Redundancy model, first assignments are removed + from contained SU as if the lock operation is also issued on contained SU. + -After removal of assignments from contained SU, all comps are terminated in contained + SU. + -Now assignments will be removed from container SU gracefully via quiesced HA state. + -Lock of container SI: + -First remove assignments from all those contained SUs where this container SI is active. + -After removal of assignments from contained SUs, all comps are terminated in these + contained SU. + -Now assignments will be removed from container SU. + -Shutdown of container SI: + -First remove assignments from all those contained SUs via quiescing state where + this container SI is active. + -After removal of assignments from contained SUs, all comps are terminated in these + contained SU. + -Now assignments will be removed from container SU gracefully via quiescing HA state. + -Restart of container component: + Here it is assumed that saAmfCompDisableRestart is false. + -First terminate associated contained component using terminate Callback. + -Terminate container component using terminate callback. + -Instantiate container component with INSTANTIATE CLC-CLI script. + -Reassign container CSI active to container component. + -Now instantiate contained component by sending saAmfContainedComponentInstantiateCallback + to container component. + -After successful instantiation of contained component, it will reassigned. + +**H)Notifications (11 Alarms and Notifications Page 417):** + AMF generatea notifications for container and contained + components as it currently generates for any other component. + +**I)Some important facts from different sections:** + -A process belonging to a container component can also belong to its associated + contained components.(3.1.2.1.1 Container and Contained Components page 45) + -A process belonging to a contained component belongs also to its associated + container component and may also belong to some of its collocated contained + components.(3.1.2.1.1 Container and Contained Components page 45) + -The container CSI can contain information to be passed by the associated container + component to the corresponding contained component. How this information is + passed is a private interface between container and contained components. + (3.1.3 Component Service Instance page 51) + -A container component can be configured to have multiple CSI assignments, + one or more for handling contained components, and others for providing + other services. In terms of functionality and syntax, there is no difference between a + container CSI used to determine the associated container component and other CSIs + corresponding to the workload of other services. + (3.1.3 Component Service Instance page 51) + - **Comment**: Will attach same as a document. --- ** [tickets:#70] AMF support for Container and contained components** **Status:** assigned **Milestone:** future **Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar **Last Updated:** Wed Jun 28, 2017 05:28 AM UTC **Owner:** Praveen Migrated from http://devel.opensaf.org/ticket/1436: Current implementation of AMF doesn't support Container and Contained components. Concept of container and contained component was introduced in B.03.01 spec. Because of this support, there were series of changes and new additions in different sections of the spec. Also a new chapeter 6 is fully dedicated to conatiner and contained components. What follows is summary of conatiner and contained components concept collected from different sections of the B.04.01 spec with reference to particular sections and page no. **A)Section 3.1.2.1.1 page 45 talks about use case related to containter** ** and contained component concept:** " The concept of container and contained components allows the Availability Management Framework to integrate components that are not executed directly by the operating system, but rather in a controlled environment running on top of the operating system. Widespread environments are runtime environments, virtual machines, or component frameworks. " AMF directly manages life cycle of container component but not of containted component. A container component cooperates with AMF for managing life cycle of contained component. If a container comp1 manages life cycle of a contained comp2 then comp1 is termed as assciated container component of comp2 and comp2 is termed as associated contained comp
[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.
Hi, Those changes were pushed in the ticket #2416. It was pushed after 5.2 GA. If there are some reproducbile steps then update this ticket. Yes, uncommenting that line will enable AMFD traces. Thanks Praveen --- ** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.** **Status:** unassigned **Milestone:** 5.17.08 **Created:** Thu May 25, 2017 08:46 AM UTC by Praveen **Last Updated:** Sat Jul 01, 2017 04:17 PM UTC **Owner:** nobody Ticket is based on a issue reported via user list mail dated: 22-May-17, subject "[users] osafamfd coredump issue. Here is syslog when the issue occurred: 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.5:bond0>, peer not responding 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.5:bond0> on network plane A 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5> 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:287038266327043) 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1050f pid:15395 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 104 <0, 1050f(down)> (MsgQueueService66831) 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left the cluster 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state change notification from NTF, entity PLD0105 now has new state DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed) 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.6:bond0>, peer not responding 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.6:bond0> on network plane A 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6> 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:288139774320643) 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1060f pid:17439 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 106 <0, 1060f(down)> (MsgQueueService67087) 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director unexpectedly crashed 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; timeout=0 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #70 AMF support for Container and contained components
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> d - **Blocker**: --> False --- ** [tickets:#70] AMF support for Container and contained components** **Status:** assigned **Milestone:** future **Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar **Last Updated:** Mon Apr 03, 2017 06:47 PM UTC **Owner:** Praveen Migrated from http://devel.opensaf.org/ticket/1436: Current implementation of AMF doesn't support Container and Contained components. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2506 ntf: ntfimcn does not handle SA_ERR_UNAVAILABLE
Imm is integrated with CLM in 5.2 release. Since IMCN is pre-5.2 component, IMM API should not return ERR_UNAVAILABLE to IMCN. --- ** [tickets:#2506] ntf: ntfimcn does not handle SA_ERR_UNAVAILABLE** **Status:** accepted **Milestone:** 5.17.08 **Created:** Tue Jun 20, 2017 11:15 AM UTC by elunlen **Last Updated:** Tue Jun 20, 2017 11:15 AM UTC **Owner:** elunlen The ntfimcn part of ntf will create an ER log and abort if saImmOmClassDescriptionGet_2 fail on SA_AIS_ERR_UNAVAILABLE (actually any OM operation that uses the om handle). This will happen if the node where ntfimcn is running leaves the CLM cluster and is connected again, the om handle will be invalid for the new cluster configuration (see AIS description of saImmOmClassDescriptionGet_2 return codes). Note1: This is not a big problem since imcn will recover without any need of node restart. No other services including ntf will be affected. Also it can only happen on the standby node (the active node cannot leave the cluster. If that happen a failover will happen) Two fixes should be done: 1. The OM handle should not have a long life-span insted it should be initialized every time it is needed. This will significantly reduce the risk of the handle to become invalid. 2. The ntfincn process should not be started on the standby node since it is not needed (HA handling was never implemented/needed) Note2: The problem was detected because CLM tests (see apitests) are not supposed to be executed on SC nodes. The detection of this is not working for this test. See [#2505] --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover
- **status**: review --> fixed - **Comment**: 5.17.08: commit 829519a4f3a86eb836a55be8301fd5d2befeeec3 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Jun 19 12:39:31 2017 +0530 amfd: maintain node attributes in imm job queue at standby [#2494] 5.17.06: commit abaef2fda56bc1fe689d0ea1d0142568e25a2830 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Jun 19 12:39:31 2017 +0530 amfd: maintain node attributes in imm job queue at standby [#2494] --- ** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC failover** **Status:** fixed **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau **Last Updated:** Mon Jun 19, 2017 09:28 AM UTC **Owner:** Praveen The problem appears when application performs a node admin operation (for instance lock-in node) and SC failover is triggered at the same time. The persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since the active node is going down. At the standby side, the admin node state is checkpoint-ed, but it is also not updated to IMM either outlined trace: in SC-1: ~~~ Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> saLogWriteLogAsync Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> handle_log_record Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << handle_log_record Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> lga_mds_msg_async_send Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> lga_mds_enc Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 msgtype: 0 Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 api_info.type: 4 Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << lga_mds_enc Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << lga_mds_msg_async_send Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << saLogWriteLogAsync Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << avd_saImmOiRtObjectUpdate ~~~ ... ~~~ Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << object_name_to_class_type: 19 Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> rt_object_update_common Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA Rcvd MDS subscribe evt from svc 34 Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS no active Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716864 osafam
[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover
- **status**: accepted --> review - **Blocker**: True --> False - **Comment**: In the issue, only update to IMM got missed but checkpoiting to standby AMFD was successful. There is separate enhancement for the case when both checkpoiting and IMM update gets missed. --- ** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC failover** **Status:** review **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau **Last Updated:** Mon Jun 19, 2017 06:12 AM UTC **Owner:** Praveen The problem appears when application performs a node admin operation (for instance lock-in node) and SC failover is triggered at the same time. The persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since the active node is going down. At the standby side, the admin node state is checkpoint-ed, but it is also not updated to IMM either outlined trace: in SC-1: ~~~ Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> saLogWriteLogAsync Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> handle_log_record Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << handle_log_record Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> lga_mds_msg_async_send Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> lga_mds_enc Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 msgtype: 0 Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 api_info.type: 4 Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << lga_mds_enc Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << lga_mds_msg_async_send Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << saLogWriteLogAsync Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << avd_saImmOiRtObjectUpdate ~~~ ... ~~~ Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << object_name_to_class_type: 19 Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> rt_object_update_common Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA Rcvd MDS subscribe evt from svc 34 Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS no active Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> lga_mds_svc_evt Jun 12 20:50:50.716875 osafamfd [2
[tickets] [opensaf:tickets] #2469 amf: Stop tracking api returns NOT_EXIST
- **summary**: clm: Stop tracking api returns NOT_EXIST --> amf: Stop tracking api returns NOT_EXIST - **status**: assigned --> unassigned - **Component**: clm --> amf - **Blocker**: True --> False --- ** [tickets:#2469] amf: Stop tracking api returns NOT_EXIST** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau **Last Updated:** Fri Jun 16, 2017 08:39 AM UTC **Owner:** Praveen When performing switchover, AMFD fails to stop CLM track callback with error code 12 (NOT_EXIST) **syslog: ** 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced 2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 (@OpenSafImmReplicatorB) <343, 2010f> 2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started 2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 2010f> (safAmfService) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 (@safAmfService2010f) <27, 2010f> 2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 2020f> (@safAmfService2020f) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 (safAmfService) <0, 2020f> 2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking after switch over 2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done **CLM trace: ** May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> proc_track_stop_msg May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> clms_node_get_by_id May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node found 131343 May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << clms_node_get_by_id May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node id = 131343 May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> clms_mds_msg_send May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << clms_mds_msg_send May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> clms_send_async_update May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, as per the send-type specified May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR svc_id:48, pwe_hdl:65552 May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> mbcsv_send_ckpt_data_to_all_peers May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552 May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling encode callback May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> mbcsv_callback May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> ckpt_encode_cbk_handler May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR cbk_arg->info.encode.io_msg_type type 1 May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> ckpt_encode_async_update May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR data->header.type 3 May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR Async update CLMS_CKPT_TRACK_START May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> enc_mbcsv_track_changes_msg May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << enc_mbcsv_track_changes_msg May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << ckpt_encode_async_update May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << ckpt_encode_cbk_handler May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << mbcsv_callback May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the encoded message to any other peer with same s/w version May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26 10:19:13.173675 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552 May 26 10:19:13.173680 osafclmd [240:240
[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover
- **status**: assigned --> accepted --- ** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC failover** **Status:** accepted **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau **Last Updated:** Thu Jun 15, 2017 10:06 AM UTC **Owner:** Praveen The problem appears when application performs a node admin operation (for instance lock-in node) and SC failover is triggered at the same time. The persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since the active node is going down. At the standby side, the admin node state is checkpoint-ed, but it is also not updated to IMM either outlined trace: in SC-1: ~~~ Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> saLogWriteLogAsync Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> handle_log_record Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << handle_log_record Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> lga_mds_msg_async_send Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> lga_mds_enc Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 msgtype: 0 Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 api_info.type: 4 Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << lga_mds_enc Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << lga_mds_msg_async_send Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << saLogWriteLogAsync Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << avd_saImmOiRtObjectUpdate ~~~ ... ~~~ Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << object_name_to_class_type: 19 Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> rt_object_update_common Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA Rcvd MDS subscribe evt from svc 34 Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS no active Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> lga_mds_svc_evt Jun 12 20:50:50.716875 osafamfd [268:271:src/log/agent/lga_mds.c:0678] TR lga_mds_svc_evtNCSMDS_NO_ACTIVE Jun 12 20:50:50.716879 osafamfd [268:271:src/log/agent/lga_mds.c:0683] TR NCSMDS_NO_ACTIVE Jun 12 20:50:50.716881 osafamfd [268:271:src/log/agent/
[tickets] [opensaf:tickets] #2469 clm: Stop tracking api returns NOT_EXIST
Hi Minh, Generally for ERR_TIMEOUT case it is recommended to finalize that handle because if the API was called for creating some resource then user does not know whether resource was created or not. Since present case is for stoping the the tracking, it may work. Thanks Praveen --- ** [tickets:#2469] clm: Stop tracking api returns NOT_EXIST** **Status:** assigned **Milestone:** 5.17.06 **Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau **Last Updated:** Wed Jun 14, 2017 02:05 AM UTC **Owner:** Praveen When performing switchover, AMFD fails to stop CLM track callback with error code 12 (NOT_EXIST) **syslog: ** 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced 2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 (@OpenSafImmReplicatorB) <343, 2010f> 2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started 2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 2010f> (safAmfService) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 (@safAmfService2010f) <27, 2010f> 2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 2020f> (@safAmfService2020f) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 (safAmfService) <0, 2020f> 2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking after switch over 2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done **CLM trace: ** May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> proc_track_stop_msg May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> clms_node_get_by_id May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node found 131343 May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << clms_node_get_by_id May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node id = 131343 May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> clms_mds_msg_send May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << clms_mds_msg_send May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> clms_send_async_update May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, as per the send-type specified May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR svc_id:48, pwe_hdl:65552 May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> mbcsv_send_ckpt_data_to_all_peers May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552 May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling encode callback May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> mbcsv_callback May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> ckpt_encode_cbk_handler May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR cbk_arg->info.encode.io_msg_type type 1 May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> ckpt_encode_async_update May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR data->header.type 3 May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR Async update CLMS_CKPT_TRACK_START May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> enc_mbcsv_track_changes_msg May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << enc_mbcsv_track_changes_msg May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << ckpt_encode_async_update May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << ckpt_encode_cbk_handler May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << mbcsv_callback May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the encoded message to any other peer with same s/w version May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26 10:19:13.173675 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC update to be sent. role: 1, svc_id
[tickets] [opensaf:tickets] #2498 amfd: incorrect saAmfSGNumPrefAssignedSUs
I had published a patch for this in ticket : \#2269:amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model. I will republish it after rebasing. --- ** [tickets:#2498] amfd: incorrect saAmfSGNumPrefAssignedSUs** **Status:** accepted **Milestone:** 5.17.06 **Created:** Fri Jun 16, 2017 04:42 AM UTC by Gary Lee **Last Updated:** Fri Jun 16, 2017 04:44 AM UTC **Owner:** Gary Lee If saAmfSGNumPrefAssignedSUs is not set, AMFD should refer to saAmfSGNumPrefInserviceSUs. This currently works, except for the case where saAmfSGNumPrefInserviceSUs is changed after startup. saAmfSGNumPrefAssignedSUs does not get updated. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Blocker**: False --> True --- ** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC failover** **Status:** assigned **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau **Last Updated:** Tue Jun 13, 2017 08:33 AM UTC **Owner:** Praveen The problem appears when application performs a node admin operation (for instance lock-in node) and SC failover is triggered at the same time. The persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since the active node is going down. At the standby side, the admin node state is checkpoint-ed, but it is also not updated to IMM either outlined trace: in SC-1: ~~~ Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> saLogWriteLogAsync Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> handle_log_record Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << handle_log_record Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> lga_mds_msg_async_send Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> lga_mds_enc Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 msgtype: 0 Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 api_info.type: 4 Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << lga_mds_enc Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << lga_mds_msg_async_send Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << saLogWriteLogAsync Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << avd_saImmOiRtObjectUpdate ~~~ ... ~~~ Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << object_name_to_class_type: 19 Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> rt_object_update_common Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA Rcvd MDS subscribe evt from svc 34 Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS no active Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << mbcsv_mds_evt: Msg is not from same vdest, discarding Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> mdtm_process_poll_recv_data_tcp Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> lga_mds_svc_evt Jun 12 20:50:50.716875 osafamfd [268:271:src/log/agent/lga_mds.c:0678] TR lga_mds_svc_evtNCSMDS_NO_ACTIVE Jun 12 20:50:50.716879 osafamfd [268:271:src/log/agent/lga_mds.c:0683] TR NCSMDS_NO_ACTIVE Ju
[tickets] [opensaf:tickets] #2496 amf: amfd crashes while trying to free invalid memory.
- **status**: accepted --> review --- ** [tickets:#2496] amf: amfd crashes while trying to free invalid memory.** **Status:** review **Milestone:** 5.17.06 **Created:** Wed Jun 14, 2017 08:49 AM UTC by Praveen **Last Updated:** Wed Jun 14, 2017 08:49 AM UTC **Owner:** Praveen Steps to reproduce: 1)Bring AMF demo up on one controller. 2)Issue lock operation on active SU. 3)When component is still processing quiesced assignment, run below command: immlist safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 4)AMF wil crash for updating runtime atributes of SU in su_rt_attr_cb(). bt: \#0 0x7fac6971fcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7fac697230d8 in __GI_abort () at abort.c:89 \#2 0x7fac6975c394 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fac6986ab28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 \#3 0x7fac6976866e in malloc_printerr (ptr=, str=0x7fac6986acc8 "free(): invalid next size (fast)", action=1) at malloc.c:4996 \#4 _int_free (av=, p=, have_lock=0) at malloc.c:3840 \#5 0x7fac6b4c471a in su_rt_attr_cb (immOiHandle=, objectName=, attributeNames=) at src/amf/amfd/su.cc:1501 \#6 0x7fac6b4531f1 in rt_attr_update_cb (immoi_handle=94489411855, object_name=0x7fac640041b8, attribute_names=0x7fac6c104290) at src/amf/amfd/imm.cc:881 \#7 0x7fac6a99bc42 in imma_process_callback_info (cb=cb@entry=0x7fac6aba6320 , cl_node=0x7fac6c0cf250, callback=callback@entry=0x7fac64004190, immHandle=94489411855) at src/imm/agent/imma_proc.cc:3266 \#8 0x7fac6a99bf79 in imma_hdl_callbk_dispatch_all (cb=0x7fac6aba6320 , immHandle=94489411855) at src/imm/agent/imma_proc.cc:1812 \#9 0x7fac6a99301d in saImmOiDispatch (immOiHandle=94489411855, dispatchFlags=SA_DISPATCH_ALL) at src/imm/agent/imma_oi_api.cc:642 \#10 0x7fac6b412868 in main_loop () at src/amf/amfd/main.cc:717 \#11 main (argc=, argv=) at src/amf/amfd/main.cc:848 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2496 amf: amfd crashes while trying to free invalid memory.
--- ** [tickets:#2496] amf: amfd crashes while trying to free invalid memory.** **Status:** accepted **Milestone:** 5.17.06 **Created:** Wed Jun 14, 2017 08:49 AM UTC by Praveen **Last Updated:** Wed Jun 14, 2017 08:49 AM UTC **Owner:** Praveen Steps to reproduce: 1)Bring AMF demo up on one controller. 2)Issue lock operation on active SU. 3)When component is still processing quiesced assignment, run below command: immlist safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 4)AMF wil crash for updating runtime atributes of SU in su_rt_attr_cb(). bt: \#0 0x7fac6971fcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7fac697230d8 in __GI_abort () at abort.c:89 \#2 0x7fac6975c394 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fac6986ab28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 \#3 0x7fac6976866e in malloc_printerr (ptr=, str=0x7fac6986acc8 "free(): invalid next size (fast)", action=1) at malloc.c:4996 \#4 _int_free (av=, p=, have_lock=0) at malloc.c:3840 \#5 0x7fac6b4c471a in su_rt_attr_cb (immOiHandle=, objectName=, attributeNames=) at src/amf/amfd/su.cc:1501 \#6 0x7fac6b4531f1 in rt_attr_update_cb (immoi_handle=94489411855, object_name=0x7fac640041b8, attribute_names=0x7fac6c104290) at src/amf/amfd/imm.cc:881 \#7 0x7fac6a99bc42 in imma_process_callback_info (cb=cb@entry=0x7fac6aba6320 , cl_node=0x7fac6c0cf250, callback=callback@entry=0x7fac64004190, immHandle=94489411855) at src/imm/agent/imma_proc.cc:3266 \#8 0x7fac6a99bf79 in imma_hdl_callbk_dispatch_all (cb=0x7fac6aba6320 , immHandle=94489411855) at src/imm/agent/imma_proc.cc:1812 \#9 0x7fac6a99301d in saImmOiDispatch (immOiHandle=94489411855, dispatchFlags=SA_DISPATCH_ALL) at src/imm/agent/imma_oi_api.cc:642 \#10 0x7fac6b412868 in main_loop () at src/amf/amfd/main.cc:717 \#11 main (argc=, argv=) at src/amf/amfd/main.cc:848 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.
- Description has changed: Diff: --- old +++ new @@ -1,4 +1,3 @@ - steps to reproduce: 1)Bring one controller up. 2)Add attached configuration in the system. - **status**: unassigned --> assigned - **assigned_to**: Praveen --- ** [tickets:#2493] amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.** **Status:** assigned **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen **Last Updated:** Tue Jun 13, 2017 07:11 AM UTC **Owner:** Praveen **Attachments:** - [1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml) (12.0 kB; text/xml) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd) (6.1 MB; application/octet-stream) - [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) (275.6 kB; application/octet-stream) steps to reproduce: 1)Bring one controller up. 2)Add attached configuration in the system. 3)Unlock-in and unlock su1. Attached configuration uses amfpm command to start active monitoring. If this command is wrongly configured by the user, AMF reports fault on the component and AMFND restarts it. Since everytime active monitoring command fails, component is getting continuously faulted. As a last option when OpenSAF is stopped on the node, AMFND asserted: syslog: Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATING Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: avnd_su_pres_st_chng_prc: Assertion 'si' failed. Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate this process bt: \#0 0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7f662fbec0d8 in __GI_abort () at abort.c:89 \#2 0x7f66306dedbe in __osafassert_fail (__file=, __line=, __func=, __assertion=) at src/base/sysf_def.c:286 \#3 0x7f66313fff3f in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_TERMINATING, prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1886 \#4 avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0, ev=) at src/amf/amfnd/susm.cc:1610 \#5 0x7f66313caf58 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at src/amf/amfnd/clc.cc:1501 \#6 0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892 \#7 0x7f66314067e8 in avnd_comp_cleanup_launch (comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178 \#8 0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/term.cc:76 \#9 0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 <_avnd_cb>, mid=) at src/amf/amfnd/di.cc:1264 \#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, evt=0x7f6628001010) at src/amf/amfnd/di.cc:411 \#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at src/amf/amfnd/main.cc:658 \#12 avnd_main_process () at src/amf/amfnd/main.cc:610 \#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at src/amf/amfnd/main.cc:203 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.
--- ** [tickets:#2493] amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen **Last Updated:** Tue Jun 13, 2017 07:11 AM UTC **Owner:** nobody **Attachments:** - [1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml) (12.0 kB; text/xml) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd) (6.1 MB; application/octet-stream) - [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) (275.6 kB; application/octet-stream) steps to reproduce: 1)Bring one controller up. 2)Add attached configuration in the system. 3)Unlock-in and unlock su1. Attached configuration uses amfpm command to start active monitoring. If this command is wrongly configured by the user, AMF reports fault on the component and AMFND restarts it. Since everytime active monitoring command fails, component is getting continuously faulted. As a last option when OpenSAF is stopped on the node, AMFND asserted: syslog: Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATING Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: avnd_su_pres_st_chng_prc: Assertion 'si' failed. Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate this process bt: \#0 0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 \#1 0x7f662fbec0d8 in __GI_abort () at abort.c:89 \#2 0x7f66306dedbe in __osafassert_fail (__file=, __line=, __func=, __assertion=) at src/base/sysf_def.c:286 \#3 0x7f66313fff3f in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_TERMINATING, prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1886 \#4 avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0, ev=) at src/amf/amfnd/susm.cc:1610 \#5 0x7f66313caf58 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at src/amf/amfnd/clc.cc:1501 \#6 0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0, ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892 \#7 0x7f66314067e8 in avnd_comp_cleanup_launch (comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178 \#8 0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/term.cc:76 \#9 0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 <_avnd_cb>, mid=) at src/amf/amfnd/di.cc:1264 \#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, evt=0x7f6628001010) at src/amf/amfnd/di.cc:411 \#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at src/amf/amfnd/main.cc:658 \#12 avnd_main_process () at src/amf/amfnd/main.cc:610 \#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at src/amf/amfnd/main.cc:203 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
- **status**: accepted --> unassigned - **assigned_to**: Praveen --> nobody --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Fri Jun 09, 2017 09:04 AM UTC **Owner:** nobody An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
Attached is the configuration to reproduce the issue. Steps to reproduce: 1)Bring SC-1 with attached imm.xml 2)After cluster time expiry, AMF will start two NPI components as part of assignment. 3) Make some delay between instantiation of two NPI components script. 4) Restart the already instantiated NPI component such that its restart will complete after instantiation of second NPI component. Attachments: - [imm.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/62c0/attachment/imm.xml) (328.6 kB; text/xml) --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** accepted **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Fri Jun 09, 2017 01:07 AM UTC **Owner:** Praveen An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
What is the configuration to reproduce this issue? With 2 CSIs in a SI in amf_demo app, I am not observing this issue. Attached are amfd and amfnd traces after successful verification. Attachments: - [osafamfd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/aa56/attachment/osafamfd) (1.8 MB; application/octet-stream) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/aa56/attachment/osafamfnd) (1.9 MB; application/octet-stream) --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** accepted **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Thu Jun 08, 2017 12:19 AM UTC **Owner:** Praveen An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
Currently AMFD returns TRY_AGAIN when SG is unstable and some admin op request comes for most of the entities. For admin restart of SU, currently AMFD returns TRY_AGAIN. I proposed solution on those lines. However, there is a general enhancement ticket to allow admin op in SG unstable cases: \#1873 amf: Avoid rejecting user requests due to internal "unstable" state. Spec is not very much clear in general for all cases. Only at one place (9.4.7 SA_AMF_ADMIN_RESTART page 384 ), to avoid restart admin op parallely over other admin op going on same entity, it states that: "The Availability Management Framework must not proceed with this operation if another administrative operation or an error recovery initiated by the Availability Management Framework is already engaged on the logical entity. In such case, the SA_AIS_ERR_TRY_AGAIN error value shall be returned to indicate that the action is feasible but not at this instant." For the case locking of standby SU and restart of component in active SU: for a restartable component it can be allowed as it will be local to AMFND. But for a non-restartable component assignments needs to be switchovered and AMFD will have to find standby SUs. It will increase the complexity for red models like Nway and NplusM. --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** accepted **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Wed Jun 07, 2017 08:37 AM UTC **Owner:** Praveen An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
- **status**: assigned --> accepted - **Comment**: Since SG is not stable, AMFD should return TRY_AGAIN to IMM client. This check is missing in comp_admin_op_cb() in amfd/comp.cc. I guess assignment in component is happening because of cluster startup timer expiry (not due to any other admin operation). --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** accepted **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Wed Jun 07, 2017 08:16 AM UTC **Owner:** Praveen An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted
- **status**: unassigned --> assigned - **assigned_to**: Praveen --- ** [tickets:#2485] amfnd: missing susi response if component is restarted** **Status:** assigned **Milestone:** 5.17.06 **Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee **Last Updated:** Wed Jun 07, 2017 12:57 AM UTC **Owner:** Praveen An SI contains multiple CSIs. If a restart component admin operation arrives at amfnd before all CSIs are assigned, the SUSI response is not sent to AMFD. This code in avnd_comp_csi_assign_done() appears to be the problem area. /* while restarting, we wont use assign all, so csi will not be null */ if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) { m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED); goto done; } Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2469 clm: Stop tracking api returns NOT_EXIST
It seems TrackStop() request came to CLMS and it executed it. But the client which is AMFD received ERR_TIMEOUT: 2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5 Now same AMFD after becoming standby tries to do TrackStop() again. It will surely get ERR_NOT_EXIST as the tracking was stopped. AMFD should finalize the handle when its gets ERR_TIMEOUT. --- ** [tickets:#2469] clm: Stop tracking api returns NOT_EXIST** **Status:** assigned **Milestone:** 5.17.06 **Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau **Last Updated:** Mon May 29, 2017 09:21 AM UTC **Owner:** Praveen When performing switchover, AMFD fails to stop CLM track callback with error code 12 (NOT_EXIST) **syslog: ** 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated 2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced 2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 (@OpenSafImmReplicatorB) <343, 2010f> 2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started 2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 2010f> (safAmfService) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 (@safAmfService2010f) <27, 2010f> 2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 2020f> (@safAmfService2020f) 2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 (safAmfService) <0, 2020f> 2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12 2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking after switch over 2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done **CLM trace: ** May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> proc_track_stop_msg May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> clms_node_get_by_id May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node found 131343 May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << clms_node_get_by_id May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node id = 131343 May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> clms_mds_msg_send May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << clms_mds_msg_send May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> clms_send_async_update May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, as per the send-type specified May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR svc_id:48, pwe_hdl:65552 May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> mbcsv_send_ckpt_data_to_all_peers May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552 May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling encode callback May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> mbcsv_callback May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> ckpt_encode_cbk_handler May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR cbk_arg->info.encode.io_msg_type type 1 May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> ckpt_encode_async_update May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR data->header.type 3 May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR Async update CLMS_CKPT_TRACK_START May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> enc_mbcsv_track_changes_msg May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << enc_mbcsv_track_changes_msg May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << ckpt_encode_async_update May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << ckpt_encode_cbk_handler May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << mbcsv_callback May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the encoded message to any other peer with same s/w version May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE May 26
[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.
Hi, In the later releases assert has been replaced with warning. Without amfd traces, it is not possbile to know why the counter was decremented before the assert. I will try to reproduce based on code analysis and will update further. Thanks Praveen --- ** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Thu May 25, 2017 08:46 AM UTC by Praveen **Last Updated:** Thu Jun 01, 2017 02:41 PM UTC **Owner:** nobody Ticket is based on a issue reported via user list mail dated: 22-May-17, subject "[users] osafamfd coredump issue. Here is syslog when the issue occurred: 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.5:bond0>, peer not responding 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.5:bond0> on network plane A 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5> 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:287038266327043) 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1050f pid:15395 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 104 <0, 1050f(down)> (MsgQueueService66831) 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left the cluster 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state change notification from NTF, entity PLD0105 now has new state DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed) 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.6:bond0>, peer not responding 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.6:bond0> on network plane A 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6> 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:288139774320643) 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1060f pid:17439 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 106 <0, 1060f(down)> (MsgQueueService67087) 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director unexpectedly crashed 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; timeout=0 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2477 amfd: Cyclic reboot after SC absence period (in large cluster)
Hi Minh, What are the steps to reproduce after applying the patch 2477_rep.diff? Thanks, Praveen --- ** [tickets:#2477] amfd: Cyclic reboot after SC absence period (in large cluster)** **Status:** review **Milestone:** 5.17.06 **Labels:** assignment failover during stop of both SC 2416 **Created:** Fri Jun 02, 2017 06:17 AM UTC by Minh Hon Chau **Last Updated:** Fri Jun 02, 2017 09:25 AM UTC **Owner:** Minh Hon Chau The scenario of the problem in this ticket happens in the same scenario reported in #2416 After SC absence period, amfd gets into osafassert(), causes coredump, and the problem repeatedly happens One of patches of #2416 had tried to call IMM sync as soon as possible, and it works fine with a small cluster (5 nodes). But a large cluster consists of about 75 nodes, the change of IMM sync calls takes mostly no effect. In #2416, a problem had been seen with an assumption of unreliable IMM sync calls in which after SC absence period, amfd had 3 assignments for a 2N SG, 2 STANDBY SUSIs , and 1 ACTIVE SUSI. It was fixed by commit :"amfd: Add iteration to failover all absent assignments [#2416]" (refer to: https://sourceforge.net/p/opensaf/tickets/2416/#f83b) Another variant problem of unreliable IMM calls before both SC go down, is that amfd can have both SUs with ACTIVE assignments, that leads to assert. This problem can only be seen in large cluster so far Details of coredump: ~~~ Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/lib64/opensaf/osafamfd'. Program terminated with signal SIGABRT, Aborted. #0 0x7f784279b0c7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: zypper install opensaf-amf-director-debuginfo-5.2.0-469.0.6128a2d.sle12.x86_64 (gdb) bt full #0 0x7f784279b0c7 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x7f784279c478 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x7f78435fdf4e in __osafassert_fail (__file=, __line=, __func=, __assertion=) at ../../opensaf/src/base/sysf_def.c:286 No locals. #3 0x7f78445671e8 in avd_sg_2n_act_susi (sg=, stby_susi=stby_susi@entry=0x7ffeef034998, cb=0x7f78447f2e80 <_control_block>) at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:596 susi = a_susi_2 = 0x7f7845e0d0c0 s_susi_1 = 0x7f7845e0d0c0 su_2 = t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} s_susi_2 = 0x7f7845e2a030 a_susi = 0x0 a_susi_1 = 0x7f7845e2a030 s_susi = 0x0 su_1 = 0x7f7845d69e60 #4 0x7f784456d5d6 in SG_2N::node_fail (this=0x7f7845d5f4f0, cb=0x7f78447f2e80 <_control_block>, su=0x7f7845d69e60) at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:3402 a_susi = s_susi = 0x7f7845d69a68 o_su = flag = __FUNCTION__ = "node_fail" su_ha_state = t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} #5 0x7f784455de1a in AVD_SG::failover_absent_assignment (this=0x7f7845d5f4f0) at ../../opensaf/src/amf/amfd/sg.cc:2307 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} __FUNCTION__ = "failover_absent_assignment" failed_su = 0x7f7845d69e60 #6 0x7f7844514125 in avd_cluster_tmr_init_evh (cb=0x7f78447f2e80 <_control_block>, evt=) at ../../opensaf/src/amf/amfd/cluster.cc:103 i_sg = 0x7f7845d5f4f0 __for_range = @0x7f7845ca2a90: {db = {_M_t = { _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits, std::allocator > const, AVD_SG*> > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits, std::allocator > const, AVD_SG*> > >> = {}, }, _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits, std::allocator >, std::basic_string<char, std::char_traits, std::allocator >, bool>> = {}, }, _M_header = {_M_color = std::_S_red, _M_parent = 0x7f7845d515e0, _M_left = 0x7f7845d03ed0, _M_right = 0x7f7845d81580}, _M_node_count = 28 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} __FUNCTION__ = "avd_cluster_tmr_init_evh" su = 0x0 node = #7 0x7f784453ca2c in process_event (cb_now=0x7f78447f2e80 <_control_block>, evt=0x7f78340013d0) at ../../opensaf/src/amf/amfd/main.cc:775 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} __FUNCTION__ = "process_event" #8 0x7f78444f6abe in main_loop () at ../../opensaf/src/amf/amfd/main.cc:691 pollretval = evt = 0x7f78340013d0 polltmo = 0 term_fd = 24 cb = 0x7f78447f2e80 <_control_block> error = old
[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.
- **status**: assigned --> review --- ** [tickets:#2475] amf: support for SC status change Callback, non SAF.** **Status:** review **Milestone:** 5.17.08 **Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen **Last Updated:** Thu Jun 01, 2017 10:19 AM UTC **Owner:** Praveen This enhancement is for supporting two resources in AMFA which will enable application to know about SCs Absence and Presence state when they go down and comes up. Information about the resources: * A callback that will be invoked by AMFA whenever a SC joins cluster and both SCs leaves cluster if SC Absence feature is enabled. Callback and its argument: void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state) where OsafAmfSCStatusT is defined as: typedef enum { OSAF_AMF_SC_PRESENT = 1, OSAF_AMF_SC_ABSENT = 2, } OsafAmfSCStatusT; This callback can be integrated with standard AMF application component . * An API to register/install above callback function: void osafAmfInstallSCStatusChangeCallback( void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.
--- ** [tickets:#2475] amf: support for SC status change Callback, non SAF.** **Status:** assigned **Milestone:** 5.17.08 **Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen **Last Updated:** Thu Jun 01, 2017 10:19 AM UTC **Owner:** Praveen This enhancement is for supporting two resources in AMFA which will enable application to know about SCs Absence and Presence state when they go down and comes up. Information about the resources: * A callback that will be invoked by AMFA whenever a SC joins cluster and both SCs leaves cluster if SC Absence feature is enabled. Callback and its argument: void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state) where OsafAmfSCStatusT is defined as: typedef enum { OSAF_AMF_SC_PRESENT = 1, OSAF_AMF_SC_ABSENT = 2, } OsafAmfSCStatusT; This callback can be integrated with standard AMF application component . * An API to register/install above callback function: void osafAmfInstallSCStatusChangeCallback( void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
- **status**: accepted --> review - **Blocker**: --> True --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** review **Milestone:** 5.17.08 **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Thu Apr 20, 2017 04:08 AM UTC **Owner:** Praveen saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.
- **Version**: 5.2 --> 5.1 - **Comment**: Observed in 5.1. --- ** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Thu May 25, 2017 08:46 AM UTC by Praveen **Last Updated:** Thu May 25, 2017 11:19 AM UTC **Owner:** nobody Ticket is based on a issue reported via user list mail dated: 22-May-17, subject "[users] osafamfd coredump issue. Here is syslog when the issue occurred: 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.5:bond0>, peer not responding 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.5:bond0> on network plane A 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5> 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:287038266327043) 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1050f pid:15395 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 104 <0, 1050f(down)> (MsgQueueService66831) 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left the cluster 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state change notification from NTF, entity PLD0105 now has new state DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed) 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.6:bond0>, peer not responding 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.6:bond0> on network plane A 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6> 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:288139774320643) 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1060f pid:17439 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 106 <0, 1060f(down)> (MsgQueueService67087) 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director unexpectedly crashed 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; timeout=0 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.
--- ** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Thu May 25, 2017 08:46 AM UTC by Praveen **Last Updated:** Thu May 25, 2017 08:46 AM UTC **Owner:** nobody Ticket is based on a issue reported via user list mail dated: 22-May-17, subject "[users] osafamfd coredump issue. Here is syslog when the issue occurred: 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.5:bond0>, peer not responding 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.5:bond0> on network plane A 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5> 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:287038266327043) 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1050f pid:15395 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 104 <0, 1050f(down)> (MsgQueueService66831) 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left the cluster 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state change notification from NTF, entity PLD0105 now has new state DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed) 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link <1.1.16:eth2-1.1.6:bond0>, peer not responding 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link <1.1.16:eth2-1.1.6:bond0> on network plane A 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6> 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 25 (change:4, dest:288139774320643) 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. Not sending track callback for agents on that node 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node received for nodeId:1060f pid:17439 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 106 <0, 1060f(down)> (MsgQueueService67087) 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director unexpectedly crashed 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer disconnected 105 <29, 1100f> (safAmfService) 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; timeout=0 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2466 AMF: NodeGroup Admin UNLOCK timeout during cluster start up
Hi Gary, Patch looks good. I have cheked, ng_unlock() does check cb->init_state and does not assigned SUs if the state is init_done. Thanks, Praveen --- ** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau **Last Updated:** Tue May 23, 2017 06:35 AM UTC **Owner:** nobody When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in this case), the nodegroup admin op can be timed out, because the su_cnt_admin_oper of one of PLs remains 1 forever Sequence in details: - A cluster has 4 nodes, start cluster - When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue ~~~ May 22 14:33:46.665539 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined the cluster May 22 14:33:48.115919 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined the cluster May 22 14:34:00.442633 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined the cluster ~~~ NoRed Opensaf SU of PL4 get assigned ~~~ May 22 14:34:00.637324 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> avd_su_si_assign_evh: id:30, node:2040f, act:2, 'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', ha:1, err:1, single:0 ~~~ admin unlock nodegroup issues ~~~ May 22 14:34:02.989761 osafamfd [11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', op:'1' ~~~ - When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment ~~~ May 22 14:34:10.096324 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> avd_su_oper_state_evh: id:29, node:2030f, 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1 May 22 14:34:10.097537 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0 May 22 14:34:10.097549 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' state=1 May 22 14:34:10.097552 osafamfd [11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1 ~~~ The su_cnt_admin_oper of NoRed Opensaf SU is increased. ~~~ May 22 14:34:10.098839 osafamfd [11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg May 22 14:34:10.098841 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1 ~~~ - When NoRed Opensaf SU get assigned ~~~ May 22 14:34:10.105283 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> avd_su_si_assign_evh: id:30, node:2030f, act:2, 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', ha:1, err:1, single:0 ~~~ but this su_cnt_admin_oper is not decreased ~~~ May 22 14:34:10.108143 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:] << susi_success May 22 14:34:10.108148 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 2010f203defc2 node not ready for assignments May 22 14:34:10.108153 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 2020fc2b319b5 node not ready for assignments May 22 14:34:10.108157 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> avd_nd_ncs_su_assigned May 22 14:34:10.108162 osafamfd [11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT ~~~ At the end, su_cnt_admin_oper still remains 1. The application SU get assigned, the counter's always decreased ~~~ May 22 14:34:10.444624 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1 May 22 14:34:10.444629 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2 May 22 14:34:10.444632 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> process_su_si_response_for_ng: 'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer' May 22 14:34:10.444640 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << process_su_si_response_for_ng ~~~ There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU when decreased counter ... /* else admin oper still not complete */ } else if ((su->sg_of_su->sg_ncs_spec == false) && ((su->su_on_node->admin_ng != nullptr) || (su->sg_of_su->ng_us
[tickets] [opensaf:tickets] #2466 AMF: NodeGroup Admin UNLOCK timeout during cluster start up
Hi Minh, I will go thorugh it today. Thanks Praveen --- ** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up** **Status:** unassigned **Milestone:** 5.17.06 **Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau **Last Updated:** Tue May 23, 2017 05:13 AM UTC **Owner:** nobody When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in this case), the nodegroup admin op can be timed out, because the su_cnt_admin_oper of one of PLs remains 1 forever Sequence in details: - A cluster has 4 nodes, start cluster - When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue ~~~ May 22 14:33:46.665539 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined the cluster May 22 14:33:48.115919 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined the cluster May 22 14:34:00.442633 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined the cluster ~~~ NoRed Opensaf SU of PL4 get assigned ~~~ May 22 14:34:00.637324 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> avd_su_si_assign_evh: id:30, node:2040f, act:2, 'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', ha:1, err:1, single:0 ~~~ admin unlock nodegroup issues ~~~ May 22 14:34:02.989761 osafamfd [11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', op:'1' ~~~ - When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment ~~~ May 22 14:34:10.096324 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> avd_su_oper_state_evh: id:29, node:2030f, 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1 May 22 14:34:10.097537 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0 May 22 14:34:10.097549 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' state=1 May 22 14:34:10.097552 osafamfd [11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1 ~~~ The su_cnt_admin_oper of NoRed Opensaf SU is increased. ~~~ May 22 14:34:10.098839 osafamfd [11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg May 22 14:34:10.098841 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1 ~~~ - When NoRed Opensaf SU get assigned ~~~ May 22 14:34:10.105283 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> avd_su_si_assign_evh: id:30, node:2030f, act:2, 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', ha:1, err:1, single:0 ~~~ but this su_cnt_admin_oper is not decreased ~~~ May 22 14:34:10.108143 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:] << susi_success May 22 14:34:10.108148 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 2010f203defc2 node not ready for assignments May 22 14:34:10.108153 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 2020fc2b319b5 node not ready for assignments May 22 14:34:10.108157 osafamfd [11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> avd_nd_ncs_su_assigned May 22 14:34:10.108162 osafamfd [11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT ~~~ At the end, su_cnt_admin_oper still remains 1. The application SU get assigned, the counter's always decreased ~~~ May 22 14:34:10.444624 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1 May 22 14:34:10.444629 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2 May 22 14:34:10.444632 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> process_su_si_response_for_ng: 'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer' May 22 14:34:10.444640 osafamfd [11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << process_su_si_response_for_ng ~~~ There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU when decreased counter ... /* else admin oper still not complete */ } else if ((su->sg_of_su->sg_ncs_spec == false) && ((su->su_on_node->admin_ng != nullptr) || (su->sg_of_su->ng_using_saAmfSGAdminState == true))) { AVD_AMF_NG *ng = su->su_on_node->admin_ng; // Got resp
[tickets] [opensaf:tickets] #2105 AMF : SG is unstable, if app responds during node link loss detection time period
Attached traces when AMFD drops the message from Amfnd. Attachments: - [traces.zip](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/09f00ae8/1ae0/attachment/traces.zip) (195.0 kB; application/x-zip-compressed) --- ** [tickets:#2105] AMF : SG is unstable, if app responds during node link loss detection time period** **Status:** review **Milestone:** 5.17.06 **Created:** Sun Oct 09, 2016 07:12 AM UTC by Srikanth R **Last Updated:** Mon May 15, 2017 07:38 AM UTC **Owner:** Minh Hon Chau Setup : Changeset : 8190 5 node SLES setup with 2 controllers and 3 payloads ( TIPC -- headless enabled) 2n application deployed on 2 payloads. Issue : -> Perform admin operation on an AMF enity. -> Do not respond to the callback and invoke headless scenario. -> On a VM with TIPC setup, 3 seconds is taken to detect the node down. -> If the application responds to a callback in admin operation during this time period when the last controller is down, the message shall not reach any controller. Amfnd on payload shall send the "Assigned" message but not store this message. For this scenario, SG shall move to unstable state. Below is the snippet from syslog, where application responded at 15:48:28 and at 15:48:31 payloads detected that last controller is down. Oct 7 15:48:28 SYSTEST-PLD-1 osafamfnd[9976]: NO Assigned 'safSi=TestApp_SI1,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: WA AMF director unexpectedly crashed Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' for pending messages Oct 7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: WA SC Absence IS allowed:900 IMMD service is DOWN Oct 7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS -> Below is the scenario, when payload detected that there is no controller at 18:31:34 and amfnd shall call avnd_di_susi_resp_send after the controllers join back the cluster. Application responded at 18:31:41. Oct 7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: WA SC Absence IS allowed:900 IMMD service is DOWN Oct 7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO avnd_di_susi_resp_send() deferred as AMF director is offline --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.
- **status**: review --> fixed - **Blocker**: --> False - **Comment**: git default: commit a7bb655d2e8b50bf22b168f7492eab9970a98849 Author: Praveen <praveen.malv...@oracle.com> Date: Fri May 12 15:09:30 2017 +0530 clm: add tool commands clm-adm, clm-state, clm-find [#2394] hg default: changeset: 8798:cf45b604af4b tag: tip user: Praveen Malviya <praveen.malv...@oracle.com> date:Tue May 16 10:40:40 2017 +0530 summary: clm: add tool commands clm-adm, clm-state, clm-find [#2394] --- ** [tickets:#2394] clm: add clm tool commands for admin op and state check.** **Status:** fixed **Milestone:** 5.17.08 **Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen **Last Updated:** Fri May 12, 2017 09:56 AM UTC **Owner:** Praveen Intention is to add clm tool comamnds: -to perform admin operation on node or on cluster. Something like clm-adm <lock|shutdown|unlock|reset> -to check CLM nodes admin state and member ship status: like clm-state <membership|adminstate> -to find CLM cluster and nodes like: clm-find <memebers|non-member> --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2443 amf: amf gets stuck after headless while processing node_up messages
Hi Long, If amfd and amfnd traces are available from both the controllers, please upload here. Thanks, Praveen --- ** [tickets:#2443] amf: amf gets stuck after headless while processing node_up messages** **Status:** review **Milestone:** 5.17.06 **Created:** Thu Apr 27, 2017 11:59 AM UTC by Long H Buu Nguyen **Last Updated:** Fri Apr 28, 2017 04:13 AM UTC **Owner:** Long H Buu Nguyen Description: After headless, SCs come up. During that time, if the Active SC is rebooted while the other SC is still initialising. There is a case that amfd in the other SC gets stuck in processing node_up messages. As a result, opensafd fails to start. Observation: Infinite node_up from syslog: 2017-04-18 14:17:36 SC-1 osafamfd[478]: NO Received node_up from 2040f: msg_id 1 2017-04-18 14:17:37 SC-1 osafamfd[478]: NO Received node_up from 2020f: msg_id 1 2017-04-18 14:17:37 SC-1 osafamfd[478]: NO Received node_up from 2030f: msg_id 1 ... Steps to reproduce: 1) Start a cluster. 2) Turn off SCs. 3) Turn on SCs. 4) After a SC becomes ACTIVE, while amfnd on the other SC is initialising NCS SU, restart the active SC. 5) Amfnd on the other SC receives NEW_ACTIVE and then gets stuck with node_up messages. Investigation: Assume after headless, SC-1 becomes ACTIVE. Amfnd in SC-2 sends a node_up message to amfd-SC-1. amfnd-SC-2 will instantiate NCS SUs in SC-2 as soon as amfd-SC-1 receives the node_up message. At the time NCS SUs in SC-2 are INSTANTIATED, if SC-1 is rebooted, amfnd-SC-2 receives NEW_ACTIVE because amfd-SC-2 is set to ACTIVE by RDE. amfnd-SC-2 sends a node_up message to amfd-SC-2. Later, amfnd-SC-2 continues to instantiate NCS SUs in SC-2. However, the NCS SUs in SC-2 are already INSTANTIATED. amfnd-SC-2 does not send oper_state message to amfd-SC-2 because the NCS SU presence states do not change: Apr 18 14:35:36.869223 osafamfnd [486:486:src/amf/amfnd/susm.cc:1563] >> avnd_su_pres_fsm_run: 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Apr 18 14:35:36.869240 osafamfnd [486:486:src/amf/amfnd/susm.cc:1570] T1 Entering SU presence state FSM: current state: 3, event: 1, su name:safSu=SC-1,safSg=2N,safApp=OpenSAF Apr 18 14:35:36.869257 osafamfnd [486:486:src/amf/amfnd/susm.cc:1581] T1 Exited SU presence state FSM: New State = 3 Apr 18 14:35:36.869273 osafamfnd [486:486:src/amf/amfnd/susm.cc:1614] << avnd_su_pres_fsm_run: 1 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.
Published V2 after incorporating comments. --- ** [tickets:#2394] clm: add clm tool commands for admin op and state check.** **Status:** review **Milestone:** 5.17.08 **Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen **Last Updated:** Mon Apr 10, 2017 01:40 PM UTC **Owner:** Praveen Intention is to add clm tool comamnds: -to perform admin operation on node or on cluster. Something like clm-adm <lock|shutdown|unlock|reset> -to check CLM nodes admin state and member ship status: like clm-state <membership|adminstate> -to find CLM cluster and nodes like: clm-find <memebers|non-member> --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.
- **status**: review --> fixed - **Blocker**: --> True - **Comment**: 5.17.08: commit aaf6c29d6e9dc59f44f37d720c043dd6c8dad4a4 Author: Praveen <praveen.malv...@oracle.com> Date: Thu May 11 13:59:17 2017 +0530 amf: support amf tool command to know AMF cluster/nodes status [#2354] default (hg): changeset: 8792:db2ba23d2963 tag: tip user: Praveen Malviya <praveen.malv...@oracle.com> date:Thu May 11 14:30:23 2017 +0530 summary: amf: support amf tool command to know AMF cluster/nodes status [#2354] --- ** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes status.** **Status:** fixed **Milestone:** 5.17.08 **Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen **Last Updated:** Fri Apr 14, 2017 11:57 AM UTC **Owner:** Praveen This discussion ticket is being raised based on a user list query dated March 1st, 2017. The query says: "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, it works good so far. Now we need to make some actions when PLD go in/out "SC Absence" mode, we have to find a way in PLD to detect if it is being in "SC Absent" mode or not. So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF API) as well? " I think we do not have any API which can be used to query OpenSAF for knowing SC absence state. MDS up and down events of directors can be used to decide SC absence state as some agents are and node directors are using. But this will add lot of code in application. Please update this ticket for a known or proposed solution. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state
- **status**: review --> fixed - **Blocker**: --> True - **Comment**: develop: commit 20970cf5e21496bfef532f62cdb860388660ef62 Author: Praveen <praveen.malv...@oracle.com> Date: Mon Apr 24 11:24:18 2017 +0530 amfd: allow nodeswbundle deletion if anyone of Node, SU or SG is locked_in [#2421] 5.17.06: commit 6c613964e1ce56ef5798b3d426d40aae1c5068ef Author: Praveen <praveen.malv...@oracle.com> Date: Mon Apr 24 11:24:18 2017 +0530 amfd: allow nodeswbundle deletion if anyone of Node, SU or SG is locked_in [#2421] --- ** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state** **Status:** fixed **Milestone:** 5.17.06 **Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh **Last Updated:** Fri Apr 21, 2017 08:50 AM UTC **Owner:** Praveen During deleting of NodeSwBundle object, AMF only check if the SUs admin state is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION state, which implicitetly means that the SU is UNINSTANTIATED. This currently blocks the SMF campagin to be rolled back in some situation. The SU's node admin state and SU's SG admin state should also be checked and the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION. /Tai --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **Blocker**: --> False - **Milestone**: 5.17.08 --> 5.17.06 - **Comment**: Pushed in released branch with revision: commit a79f101873dffd145aa70d9cb4eb3c99b8ffd4ca Author: Praveen <praveen.malv...@oracle.com> Date: Fri Apr 21 14:31:19 2017 +0530 clms: return TIME_OUT for unlock op if CLMS update to CLM agent fails [#2381] --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** fixed **Milestone:** 5.17.06 **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Fri Apr 21, 2017 11:21 AM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_BAD_OPERATION' SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> clms_admin_state_update_rattr: Admin state 1 update for node safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster ~~~ .. .. *but Sending track callback failed for SA_CLM_CHANGE_COMPLETED* ~~~ Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << clms_prep_and_send_track Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending track callback failed for SA_CLM_CHANGE_COMPLETED Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track ~~~ -- and later performed admin operation got failed as 'Another Admin operation already in progress' ~~~ Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:2 Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another Admin operation already in progress: 4 ~~~ Notes: 1. Syslog of Active controller attached 2. os
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **status**: review --> fixed - **Comment**: commit 66970f59421f9d4338ee6d13134afca9082c1e91 Author: Praveen <praveen.malv...@oracle.com> Date: Fri Apr 21 14:31:19 2017 +0530 clms: return TIME_OUT for unlock op if CLMS update to CLM agent fails [#2381] changeset: 8775:10bbd3156a40 tag: tip user: Praveen Malviya <praveen.malv...@oracle.com> date:Fri Apr 21 14:45:15 2017 +0530 summary: clms: return TIME_OUT for unlock op if CLMS update to CLM agent fails [#2381] --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** fixed **Milestone:** 5.17.08 **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Mon Apr 10, 2017 01:40 PM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_BAD_OPERATION' SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> clms_admin_state_update_rattr: Admin state 1 update for node safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster ~~~ .. .. *but Sending track callback failed for SA_CLM_CHANGE_COMPLETED* ~~~ Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << clms_prep_and_send_track Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending track callback failed for SA_CLM_CHANGE_COMPLETED Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track ~~~ -- and later performed admin operation got failed as 'Another Admin operation already in progress' ~~~ Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:2 Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name
[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state
- **status**: assigned --> accepted --- ** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state** **Status:** accepted **Milestone:** 5.17.06 **Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh **Last Updated:** Thu Apr 20, 2017 04:07 AM UTC **Owner:** Praveen During deleting of NodeSwBundle object, AMF only check if the SUs admin state is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION state, which implicitetly means that the SU is UNINSTANTIATED. This currently blocks the SMF campagin to be rolled back in some situation. The SU's node admin state and SU's SG admin state should also be checked and the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION. /Tai --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
- **status**: unassigned --> accepted - **assigned_to**: Praveen - **Milestone**: future --> 5.17.08 --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** accepted **Milestone:** 5.17.08 **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Thu Mar 23, 2017 07:11 AM UTC **Owner:** Praveen saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Milestone**: 5.17.08 --> 5.17.06 --- ** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state** **Status:** assigned **Milestone:** 5.17.06 **Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh **Last Updated:** Tue Apr 11, 2017 09:33 AM UTC **Owner:** Praveen During deleting of NodeSwBundle object, AMF only check if the SUs admin state is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION state, which implicitetly means that the SU is UNINSTANTIATED. This currently blocks the SMF campagin to be rolled back in some situation. The SU's node admin state and SU's SG admin state should also be checked and the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION. /Tai --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.
- **status**: accepted --> review - **Milestone**: future --> 5.17.08 --- ** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes status.** **Status:** review **Milestone:** 5.17.08 **Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen **Last Updated:** Tue Apr 11, 2017 06:20 AM UTC **Owner:** Praveen This discussion ticket is being raised based on a user list query dated March 1st, 2017. The query says: "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, it works good so far. Now we need to make some actions when PLD go in/out "SC Absence" mode, we have to find a way in PLD to detect if it is being in "SC Absent" mode or not. So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF API) as well? " I think we do not have any API which can be used to query OpenSAF for knowing SC absence state. MDS up and down events of directors can be used to decide SC absence state as some agents are and node directors are using. But this will add lot of code in application. Please update this ticket for a known or proposed solution. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.
- **summary**: osaf: How to detect if payload is being in "SC Absence" mode. --> amf: support amf tool command to know AMF cluster/nodes status. - **status**: unassigned --> accepted - **assigned_to**: Praveen - **Type**: discussion --> enhancement - **Component**: osaf --> amf - **Part**: - --> tools - **Priority**: minor --> major - **Comment**: I think, a tool command and a callback is needed. With tool command , a user can check status of nodes in cluster any time. Since both CLM and AMF have notion of nodes amd cluster, a user may want to know the status of CLM or AMF cluster. But this status is not just mere listing of nodes. This is already being done with currently supported utilities. The command should also consider OpenSAF status also. For example: During SCs Absence, amf-state siass list of SISUs for controllers also, but a user can not know that controllers are up or not with this. For callback, a user can not run tool command continuously to check whether controllers exist or not. Also calling some SAF API on payload in an application to know, based on its return status, whether host payload is in SC Absence mode or not is not a proper solution as return code of API can have multiple interpretations. So there should be some callback also to inform application that this host payload has entered SC absence mode or has returned back to SC Presence mode.Application will subsribe for this callback, I will send out a patch for amf cluster status and will see possiblity of a callback either in CLM or AMF. --- ** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes status.** **Status:** accepted **Milestone:** future **Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen **Last Updated:** Wed Mar 08, 2017 07:28 AM UTC **Owner:** Praveen This discussion ticket is being raised based on a user list query dated March 1st, 2017. The query says: "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, it works good so far. Now we need to make some actions when PLD go in/out "SC Absence" mode, we have to find a way in PLD to detect if it is being in "SC Absent" mode or not. So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF API) as well? " I think we do not have any API which can be used to query OpenSAF for knowing SC absence state. MDS up and down events of directors can be used to decide SC absence state as some agents are and node directors are using. But this will add lot of code in application. Please update this ticket for a known or proposed solution. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2417 amf: support for si-swap in N+M model when Standbys are in different SUs.
--- ** [tickets:#2417] amf: support for si-swap in N+M model when Standbys are in different SUs.** **Status:** accepted **Milestone:** next **Created:** Mon Apr 10, 2017 06:17 AM UTC by Praveen **Last Updated:** Mon Apr 10, 2017 06:17 AM UTC **Owner:** Praveen This is continuation of ticket #2259 This new ticket will consider more general case: "When SIs in designated SUs have their standby distributed in different SUs," Will update with an example configuration.. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **status**: accepted --> review --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** review **Milestone:** next **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Mon Apr 03, 2017 06:05 AM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_BAD_OPERATION' SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> clms_admin_state_update_rattr: Admin state 1 update for node safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster ~~~ .. .. *but Sending track callback failed for SA_CLM_CHANGE_COMPLETED* ~~~ Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << clms_prep_and_send_track Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending track callback failed for SA_CLM_CHANGE_COMPLETED Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track ~~~ -- and later performed admin operation got failed as 'Another Admin operation already in progress' ~~~ Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:2 Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another Admin operation already in progress: 4 ~~~ Notes: 1. Syslog of Active controller attached 2. osafclmd of Active controller attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://so
[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.
- **status**: review --> fixed - **Comment**: changeset: 213:388a98e1ce37 tag: tip user: Praveen Malviya <praveen.malv...@oracle.com> date:Mon Apr 03 10:24:13 2017 +0530 summary: amf: AMF PR doc updates for #1190, #2259, #2144, #2065 and #2233 [#2392]. --- ** [tickets:#2392] amf: PR doc updates for 5.2 release.** **Status:** fixed **Milestone:** 5.2.RC2 **Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen **Last Updated:** Tue Mar 28, 2017 09:53 AM UTC **Owner:** Praveen Updates to be done for: -Enhancments: #1190, #2259, #2144, #2252 -Defect:2233 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **status**: assigned --> accepted - **Milestone**: 5.0.2 --> next - **Comment**: Analysis: There seems problem with atleast one CLM clinet which resides on PL-3. CLM is unable to send any message to this client. 1) CLMD creates this client as standby: 2) Mar 15 13:47:47.080883 osafclmd [2763:src/clm/clmd/clms_evt.c:0140] TR client_id: 63 lookup failed Mar 15 13:47:47.080886 osafclmd [2763:src/clm/clmd/clms_evt.c:0250] >> clms_client_new: MDS dest 2030feebec01a Mar 15 13:47:47.080888 osafclmd [2763:src/clm/clmd/clms_evt.c:0277] << clms_client_new: client_id 63 2) When user performs admin operation, CLMD tries to send track callback for complete step to this client but mds returns failure: 3) Mar 15 14:32:10.759655 osafclmd [2763:src/clm/clmd/clms_util.c:1095] TR Client ID 63 ,track_flags=3 Mar 15 14:32:10.759658 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track Mar 15 14:32:10.759661 osafclmd [2763:src/clm/clmd/clms_util.c:0352] >> clms_nodedb_lookup Mar 15 14:32:10.759664 osafclmd [2763:src/clm/clmd/clms_util.c:0354] TR patricia tree size 4 Mar 15 14:32:10.759667 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node found 131343 Mar 15 14:32:10.759670 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node found 131599 Mar 15 14:32:10.759673 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node found 131855 Mar 15 14:32:10.759676 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node found 132111 Mar 15 14:32:10.759687 osafclmd [2763:src/clm/clmd/clms_util.c:0375] TR num_nd_changes 4 Mar 15 14:32:10.759689 osafclmd [2763:src/clm/clmd/clms_util.c:0376] << clms_nodedb_lookup Mar 15 14:32:10.759693 osafclmd [2763:src/clm/clmd/clms_mds.c:1494] >> clms_mds_msg_send Mar 15 14:32:10.759728 osafclmd [2763:src/clm/clmd/clms_mds.c:1525] IN mds send returned: 2 Mar 15 14:32:10.759732 osafclmd [2763:src/clm/clmd/clms_mds.c:1527] << clms_mds_msg_send Mar 15 14:32:10.759735 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed 4) Before admin operation on PL-3, this node was restarted. There is no evidence of this client going down in clmd traces. 5) When unlock operation was performed, CLMD again could not send membeship status to this client and did not reply to IMM. Also admin op params are not reset.. Since admin operation params are not reset, no further admin operation are not allowed and getting timed out. --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** accepted **Milestone:** next **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Thu Mar 16, 2017 09:01 AM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_BAD_OPERATION' SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.3740
[tickets] [opensaf:tickets] #2403 amf: cores were generated for amfd and amfnd on different controllers.
log file is not NFS mounted. But these core are generated on different nodes. For amfd it is on active controller and for amfnd on standby controller. I had run some tests by enabling AMFD and AMFND traces on a four nodes system.When this happend size of trace file was huge in GBs. I will be rerunning theses tests probably next week. Since it is not reproducible, we can hold this ticket for next week. --- ** [tickets:#2403] amf: cores were generated for amfd and amfnd on different controllers.** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen **Last Updated:** Fri Mar 31, 2017 06:09 AM UTC **Owner:** nobody **Attachments:** - [amfd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads.log) (3.0 kB; application/octet-stream) - [amfd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads_full.log) (22.1 kB; application/octet-stream) - [amfnd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads.log) (2.9 kB; application/octet-stream) - [amfnd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads_full.log) (23.0 kB; application/octet-stream) Observed AMFD and AMFND crashes when calling TRACE() API. amfd: \#0 0x7f44834db70d in write () from /lib64/libpthread.so.0 \#1 0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at src/amf/amfd/timer.cc:113 \#4 0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/ndfsm.cc:1066 \#5 0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/main.cc:792 \#6 0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693 amfnd: (gdb) bt \#0 0x7fdd0f3f270d in write () from /lib64/libpthread.so.0 \#1 0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646 \#4 0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2403 amf: cores were generated for amfd and amfnd on different controllers.
- **summary**: amf: amfd and amfnd crashes while calling TRACE() API. --> amf: cores were generated for amfd and amfnd on different controllers. - Description has changed: Diff: --- old +++ new @@ -1,4 +1,3 @@ - Observed AMFD and AMFND crashes when calling TRACE() API. amfd: - Attachments has changed: Diff: --- old +++ new @@ -0,0 +1,4 @@ +amfd_bt_threads.log (3.0 kB; application/octet-stream) +amfd_bt_threads_full.log (22.1 kB; application/octet-stream) +amfnd_bt_threads.log (2.9 kB; application/octet-stream) +amfnd_bt_threads_full.log (23.0 kB; application/octet-stream) - **Comment**: I mentioned in the the comment but forgot to change the title. It seems AMFD got and unresponsive, so amfnd generated its core after heartbeat timeout.Similarly on other controller amfnd got stuck and unresponsive, so watchdog generated its core. Attached is full bt in file. --- ** [tickets:#2403] amf: cores were generated for amfd and amfnd on different controllers.** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen **Last Updated:** Thu Mar 30, 2017 01:23 PM UTC **Owner:** nobody **Attachments:** - [amfd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads.log) (3.0 kB; application/octet-stream) - [amfd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads_full.log) (22.1 kB; application/octet-stream) - [amfnd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads.log) (2.9 kB; application/octet-stream) - [amfnd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads_full.log) (23.0 kB; application/octet-stream) Observed AMFD and AMFND crashes when calling TRACE() API. amfd: \#0 0x7f44834db70d in write () from /lib64/libpthread.so.0 \#1 0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at src/amf/amfd/timer.cc:113 \#4 0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/ndfsm.cc:1066 \#5 0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/main.cc:792 \#6 0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693 amfnd: (gdb) bt \#0 0x7fdd0f3f270d in write () from /lib64/libpthread.so.0 \#1 0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646 \#4 0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2403 amf: amfd and amfnd crashes while calling TRACE() API.
AMFD and AMFND traces around this time. In both of these cases cores are generated by AMFND and watchdog respectively. In both the cases timer was stopped and trading was done around that time. For AMFD crash: AMFD traces: Mar 28 17:12:43.075338 osafamfd [6614:6614:src/mbc/mbcsv_act.c:0412] << ncs_mbscv_rcv_decode Mar 28 17:12:43.075345 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0929] >> mbcsv_send_msg: event type: 12 Mar 28 17:12:43.075352 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0954] TR NCS_MBCSV_MSG_SYNC_SEND_RSP event Mar 28 17:12:43.075399 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:1 Mar 28 17:12:43.075407 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0218] TR send type MDS_SENDTYPE_RRSP: Mar 28 17:12:43.075576 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0244] << mbcsv_mds_send_msg: success Mar 28 17:12:43.075599 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0999] << mbcsv_send_msg Mar 28 17:12:43.075606 osafamfd [6614:6614:src/mbc/mbcsv_act.c:0452] << ncs_mbcsv_rcv_async_update Mar 28 17:12:43.075615 osafamfd [6614:6614:src/mbc/mbcsv_pr_evts.c:0222] << mbcsv_process_events Mar 28 17:12:43.075625 osafamfd [6614:6614:src/mbc/mbcsv_pr_evts.c:0278] << mbcsv_hdl_dispatch_all Mar 28 17:12:43.075633 osafamfd [6614:6614:src/mbc/mbcsv_api.c:0435] << mbcsv_process_dispatch_request: retval: 1 Mar 28 17:12:52.871798 osafamfd [6614:6616:src/mbc/mbcsv_tmr.c:0250] TR Timer expired. my role:2, svc_id:10, pwe_hdl:65537, peer_anchor:565213973364764, tmr type:NCS_MBCSV_TMR_SEND_WARM_SYNC Mar 28 17:13:43.342772 osafamfd [6614:6616:src/amf/amfd/timer.cc:0154] >> avd_tmr_exp Mar 28 17:13:43.342824 osafamfd [6614:6616:src/amf/amfd/timer.cc:0175] << avd_tmr_exp Mar 28 17:13:43.348644 osafamfd [6614:6614:src/amf/amfd/main.cc:0774] >> process_event: evt->rcv_evt 14 Mar 28 17:13:43.348673 osafamfd [6614:6614:src/amf/amfd/ndfsm.cc:1058] >> avd_tmr_snd_hb_evh: seq_id=1212 Mar 28 17:13:43.349443 osafamfd [6614:6614:src/amf/amfd/timer.cc:0113] >> avd_stop_tmr: 0 messages: Mar 28 17:13:43 PM_SC-1 osafamfnd[6629]: ER AMF director heart beat timeout, generating core for amfd Mar 28 17:13:43 PM_SC-1 kernel: [17482.341638] ata1.00: device reported invalid CHS sector 0 Mar 28 17:13:43 PM_SC-1 osaffmd[6546]: NO AMFND down on: 2020f Mar 28 17:13:43 PM_SC-1 kernel: [17482.341647] ata1: EH complete Mar 28 17:13:44 PM_SC-1 osafamfnd[6629]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, SupervisionTime = 60 Mar 28 17:13:44 PM_SC-1 osaffmd[6546]: NO FM down on: 2020f For amfnd: amfnd: Mar 28 17:12:27.530364 osafamfnd [6502:6502:src/amf/amfnd/cbq.cc:0242] >> avnd_evt_ava_resp_evh Mar 28 17:12:27.530373 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:0509] TR safComp=AMFWDOG,safSu=SC-2,safSg=NoRed,safApp=OpenSAF: Type=15 Mar 28 17:12:27.530382 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:0612] >> avnd_int_ext_comp_val: safComp=AMFWDOG,safSu=SC-2,safSg=NoRed,safApp=OpenSAF Mar 28 17:12:27.530390 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:] << avnd_int_ext_comp_val Mar 28 17:12:27.530403 osafamfnd [6502:6502:src/amf/amfnd/tmr.cc:0126] TR callback response timer stopped Mar 28 17:12:27.530412 osafamfnd [6502:6502:src/amf/amfnd/cbq.cc:0543] << avnd_evt_ava_resp_evh Mar 28 17:12:27.530419 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0669] TR Evt Type:33 success Mar 28 17:12:27.530427 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0674] << avnd_evt_process Mar 28 17:12:33.631398 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0646] >> avnd_evt_process messages: Mar 28 17:13:27 PM_SC-2 osafamfwd[6518]: TIMEOUT receiving AMF health check request, generating core for amfnd Mar 28 17:13:35 PM_SC-2 kernel: [16964.152949] ata1.00: qc timeout (cmd 0xe7) Mar 28 17:13:35 PM_SC-2 kernel: [16964.152994] ata1.00: FLUSH failed Emask 0x4 Mar 28 17:13:35 PM_SC-2 kernel: [16964.153005] ata1: hard resetting link Mar 28 17:13:35 PM_SC-2 kernel: [16964.472367] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 28 17:13:35 PM_SC-2 kernel: [16964.473051] ata1.00: configured for UDMA/133 Mar 28 17:13:35 PM_SC-2 kernel: [16964.473055] ata1.00: retrying FLUSH 0xe7 Emask 0x4 Mar 28 17:13:43 PM_SC-2 kernel: [16972.985932] ata1.00: device reported invalid CHS sector 0 Mar 28 17:13:44 PM_SC-2 osafamfwd[6518]: Last received healthcheck cnt=1208 at Tue Mar 28 17:12:27 2017 Mar 28 17:13:44 PM_SC-2 osafamfwd[6518]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 131599, SupervisionTime = 60 Mar 28 17:13:44 PM_SC-2 osafclmd[6482]: AL AMF Node Director is down, terminate this process --- ** [tickets:#2403] amf: amfd and amfnd crashes while calling TRACE() API.** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen **Last Updated:
[tickets] [opensaf:tickets] #2403 amf: amfd and amfnd crashes while calling TRACE() API.
--- ** [tickets:#2403] amf: amfd and amfnd crashes while calling TRACE() API.** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen **Last Updated:** Thu Mar 30, 2017 08:39 AM UTC **Owner:** nobody Observed AMFD and AMFND crashes when calling TRACE() API. amfd: \#0 0x7f44834db70d in write () from /lib64/libpthread.so.0 \#1 0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at src/amf/amfd/timer.cc:113 \#4 0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/ndfsm.cc:1066 \#5 0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at src/amf/amfd/main.cc:792 \#6 0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693 amfnd: (gdb) bt \#0 0x7fdd0f3f270d in write () from /lib64/libpthread.so.0 \#1 0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0 \#2 0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, unsigned int, char const*, ...) () at ./src/base/logtrace.h:166 \#3 0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646 \#4 0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch
- **Milestone**: 5.2.RC2 --> next --- ** [tickets:#2100] Standby should not be rebooted, for SC absence configuration mismatch** **Status:** unassigned **Milestone:** next **Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 05:33 AM UTC **Owner:** nobody Changeset : 8190 5.1.GA -> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in immd.conf. -> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is started on SC-2, for which node rebooted. Oct 7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the same as on active IMMD. Active: 900, Standby: 0. Exiting. Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Here user had misconfigured the configuration on both the controllers, for which standby rebooted. Opensafd is enabled in runlevel as part of installation and standby shall reboot continuously until opensafd is stopped on SC-1. Suggested behavior : Opensafd should not start on standby, instead of immediate reboot. Also, the cluster level attributes like IMMSV_SC_ABSENCE_ALLOWED, can be moved to imm.xml. Node level attributes like traces enabling can be retained in configuration files. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.
- **status**: accepted --> review --- ** [tickets:#2392] amf: PR doc updates for 5.2 release.** **Status:** review **Milestone:** 5.2.RC2 **Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen **Last Updated:** Tue Mar 28, 2017 09:52 AM UTC **Owner:** Praveen Updates to be done for: -Enhancments: #1190, #2259, #2144, #2252 -Defect:2233 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.
AMF PR doc for review for #1190, #2259, #2144, #2065 and #2233. Attachments: - [OpenSAF_AMF_PR_5.2.odt](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1d1e1df6/f63c/attachment/OpenSAF_AMF_PR_5.2.odt) (133.4 kB; application/vnd.oasis.opendocument.text) --- ** [tickets:#2392] amf: PR doc updates for 5.2 release.** **Status:** accepted **Milestone:** 5.2.RC2 **Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen **Last Updated:** Thu Mar 23, 2017 05:36 AM UTC **Owner:** Praveen Updates to be done for: -Enhancments: #1190, #2259, #2144, #2252 -Defect:2233 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.
- **status**: review --> assigned - **Milestone**: 5.0.2 --> next --- ** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.** **Status:** assigned **Milestone:** next **Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen **Last Updated:** Fri Mar 10, 2017 10:44 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in N-Way Active model. Issue can be reproduced by brining up the attached configurration. In the application saAmfSGNumPrefAssignedSUs is set to 2: immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass saAmfSGNumPrefAssignedSUs SA_UINT32_T 2 (0x2) But AMF is giving assignmets to all the three SUs: safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) Since this attribute is valid for N-Way model also, issue is applicable to N-Way model also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #316 SI Assignments are not removed for a SU in Nway redundancy model
- **status**: review --> assigned - **Milestone**: 5.0.2 --> next --- ** [tickets:#316] SI Assignments are not removed for a SU in Nway redundancy model** **Status:** assigned **Milestone:** next **Created:** Fri May 24, 2013 08:39 AM UTC by Nagendra Kumar **Last Updated:** Thu Jan 05, 2017 06:31 AM UTC **Owner:** Praveen **Attachments:** - [logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/logs.tar) (2.5 MB; application/x-gzip-compressed) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfd) (228.2 kB; application/octet-stream) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfnd) (122.8 kB; application/octet-stream) - [pl_logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/pl_logs.tar) (1.3 MB; application/x-gzip-compressed) Migrated from http://devel.opensaf.org/ticket/2987 changeset : 3855 Model : NWay configuration : 1App,1SG,5SU with 3comps each, 5SIs with 3csi each. si-si deps configured as SI1<-SI2<-SI3<-SI4 SIrankedSus not configured. Node mapping : SU1 on SC-1, SU2 on SC-2, SU3 on PL-3, SU4,SU5 on PL-4. While running the campaign, smf performs lock,lock-in of the activation units i.e SUs. The SIs for SU3 are not removed though SU3 is in locked-state. Subsequent unlock-in,unlock of SU3 fails. /var/log/messages of active ctrl- SC-1 shows Feb 3 22:45:14 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:16 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:18 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:20 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:23 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[safSu=SU3,safSg=SGONE,safApp=NWAYAPP], opId=[3] Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to call admin operation 3 on safSu=SU3,safSg=SGONE,safApp=NWAYAPP Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to Terminate activation units in step=safSmfStep=0003 Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step undoing failed Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step safSmfStep=0003 in procedure safSmfProc=amfClusterProc-1 failed, step result 5 Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: NO CAMP: Procedure safSmfProc=amfClusterProc-1 returned FAILED SU Assignments brief: === safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI3,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI2,safApp=NWAYAPP saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU3\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP saAmfSISUHAState=QUIESCED(3) safSISU=safSu=SU4\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU2\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI1,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) SU States: == safSu=SU3,safSg=SGONE,safApp=NWAYAPP saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) changed 4 months ago by bertil ¶ ■owner changed from ingber to ravisekhar ■component changed from saf/smfsv to saf/avsv I beleave this is an AMF problem. SMF only uses the AMF admin ops (lock, unlock etc). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2372 amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.
- **status**: review --> fixed - **Comment**: changeset: 8727:9a1452dcd190 branch: opensaf-5.0.x parent: 8721:b2e2a9162664 user: Praveen Malviya <praveen.malv...@oracle.com> date:Tue Mar 28 12:19:02 2017 +0530 summary: amf: fix track callback when multiple CLM nodes leaves membership[#2372]. changeset: 8728:bdd9cdb1ced9 branch: opensaf-5.1.x parent: 8722:9c295151f262 user: Praveen Malviya <praveen.malv...@oracle.com> date:Tue Mar 28 12:19:40 2017 +0530 summary: amf: fix track callback when multiple CLM nodes leaves membership[#2372]. changeset: 8729:a8fa805d5765 tag: tip parent: 8726:cad103f14b48 user:Praveen Malviya <praveen.malv...@oracle.com> date:Tue Mar 28 12:20:28 2017 +0530 summary: amf: fix track callback when multiple CLM nodes leaves membership[#2372]. [staging:9a1452] [staging:bdd9cd] [staging:a8fa80] --- ** [tickets:#2372] amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** fixed **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Mon Mar 27, 2017 09:59 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster
- **status**: assigned --> duplicate - **Milestone**: 5.2.RC2 --> future --- ** [tickets:#2371] AMF: NPM app went into unstable state while expanding cluster** **Status:** duplicate **Milestone:** future **Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava **Last Updated:** Fri Mar 17, 2017 08:29 AM UTC **Owner:** Praveen **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) (86.9 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) (13.0 MB; application/octet-stream) Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) 4 node cluster without PBE Summary - Application went into unstable state and campaign execution could not complete while expanding the cluster using campaign Steps: 1. Brought up an NPM application with 5 SUs 2. Using campaign add a 3rd payload PL-5 to the cluster App went into bad state Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:27 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:29 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2320 clm: standby clmd crashes due to missing node information
- **status**: assigned --> duplicate - **Part**: - --> d - **Milestone**: 5.2.RC2 --> 5.0.2 - **Comment**: Problem is fixed in #2325. Please raise a new ticket with traces and logs if still observed. --- ** [tickets:#2320] clm: standby clmd crashes due to missing node information** **Status:** duplicate **Milestone:** 5.0.2 **Created:** Wed Feb 22, 2017 12:55 PM UTC by Zoran Milinkovic **Last Updated:** Mon Mar 27, 2017 10:59 AM UTC **Owner:** Praveen The standby CLMD service crashed due to missing PL-3 information. syslog from SC-2: ~~~ Feb 13 00:43:31 SC-2-2 osafamfd[5082]: NO Cold sync complete! Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: Ruling epoch noted as:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY' Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19082 Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO Epoch set to 5 in ImmModel Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 4 new epoch:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2010f old epoch: 4 new epoch:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2030f old epoch: 0 new epoch:5 Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ER Node is NULL,problem with the database. Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' failed. Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: ER safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Feb 13 00:43:32 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60 ~~~ Coredump: ~~~ [New LWP 5066] [New LWP 5069] [New LWP 5068] [New LWP 5070] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/lib64/opensaf/osafclmd'. Program terminated with signal SIGABRT, Aborted. #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 ### BT ### #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 #1 0x7fbc7be89478 in abort () from /lib64/libc.so.6 #2 0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 "../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, __func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", __assertion=__assertion@entry=0x7fbc7e1b78ea "0") at ../../opensaf/src/base/sysf_def.c:281 #3 0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468 #4 0x7fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=, cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310 #5 ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:1997 #6 mbcsv_callback (arg=0x7fff27b6b1a0) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:719 #7 0x7fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393 #8 0x7fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440 #9 0x7fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, mbcsv_hdl=mbcsv_hdl@entry=4293918753) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:168 #10 0x7fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272 #11 0x7fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) at ../../opensaf/src/mbc/mbcsv_api.c:423 #12 0x7fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:687 #13 0x7fbc7e19e4e4 in main (argc=, argv=) at ../../opensaf/src/clm/clmd/clms_main.c:535 ### BT FULL ### #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x7fbc7be89478 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 "../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, __func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", __assertion=__assertion@entry=0x7fbc7e1b78ea "0"
[tickets] [opensaf:tickets] #2320 clm: standby clmd crashes due to missing node information
- **status**: unassigned --> assigned - **assigned_to**: Praveen --- ** [tickets:#2320] clm: standby clmd crashes due to missing node information** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Wed Feb 22, 2017 12:55 PM UTC by Zoran Milinkovic **Last Updated:** Fri Feb 24, 2017 10:06 AM UTC **Owner:** Praveen The standby CLMD service crashed due to missing PL-3 information. syslog from SC-2: ~~~ Feb 13 00:43:31 SC-2-2 osafamfd[5082]: NO Cold sync complete! Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: Ruling epoch noted as:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY' Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19082 Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO Epoch set to 5 in ImmModel Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 4 new epoch:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2010f old epoch: 4 new epoch:5 Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at node 2030f old epoch: 0 new epoch:5 Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ER Node is NULL,problem with the database. Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' failed. Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: ER safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Feb 13 00:43:32 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60 ~~~ Coredump: ~~~ [New LWP 5066] [New LWP 5069] [New LWP 5068] [New LWP 5070] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/lib64/opensaf/osafclmd'. Program terminated with signal SIGABRT, Aborted. #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 ### BT ### #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 #1 0x7fbc7be89478 in abort () from /lib64/libc.so.6 #2 0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 "../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, __func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", __assertion=__assertion@entry=0x7fbc7e1b78ea "0") at ../../opensaf/src/base/sysf_def.c:281 #3 0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468 #4 0x7fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=, cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310 #5 ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:1997 #6 mbcsv_callback (arg=0x7fff27b6b1a0) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:719 #7 0x7fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393 #8 0x7fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440 #9 0x7fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, mbcsv_hdl=mbcsv_hdl@entry=4293918753) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:168 #10 0x7fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272 #11 0x7fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) at ../../opensaf/src/mbc/mbcsv_api.c:423 #12 0x7fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:687 #13 0x7fbc7e19e4e4 in main (argc=, argv=) at ../../opensaf/src/clm/clmd/clms_main.c:535 ### BT FULL ### #0 0x7fbc7be880c7 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x7fbc7be89478 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 "../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, __func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", __assertion=__assertion@entry=0x7fbc7e1b78ea "0") at ../../opensaf/src/base/sysf_def.c:281 No locals. #3 0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, data=0x7fbc7f218a50) at ../../opensaf
[tickets] [opensaf:tickets] #2372 amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.
- **summary**: amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node. --> amf: CLM lock of two more nodes returns REPAIR_PENDING for first node. --- ** [tickets:#2372] amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** review **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Mon Mar 27, 2017 09:59 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
- **status**: accepted --> review --- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** review **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Thu Mar 23, 2017 09:16 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
Hi, I think there is no problem from CLM perspective. I have checked in both of the cases above, initialViewNumber are passed correctly at all stages and an application always distingiushes based on the passed initialveiwnumber. So the fix is needed in AMF. I will sent out a patch. Thanks, Praveen --- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** accepted **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Thu Mar 16, 2017 07:08 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
- **assigned_to**: Praveen --> nobody --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** unassigned **Milestone:** next **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Thu Mar 23, 2017 05:19 AM UTC **Owner:** nobody saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.
- **status**: review --> fixed - **Milestone**: 5.2.RC2 --> 5.0.2 - **Comment**: changeset: 8718:8d305dff2257 branch: opensaf-5.0.x parent: 8715:dae6b6197639 user: Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 12:34:04 2017 +0530 summary: amfd: remove assignments from lower ranked SU while adjusting SI assignments [#2268] changeset: 8719:263af6bf5c65 branch: opensaf-5.1.x parent: 8716:8d149783d95a user: Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 12:35:03 2017 +0530 summary: amfd: remove assignments from lower ranked SU while adjusting SI assignments [#2268] changeset: 8720:057a8a4b1a99 tag: tip parent: 8717:6cffd8965ae4 user:Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 12:36:49 2017 +0530 summary: amfd: remove assignments from lower ranked SU while adjusting SI assignments [#2268] [staging:8d305d] [staging:263af6] [staging:057a8a] --- ** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way Active model.** **Status:** fixed **Milestone:** 5.0.2 **Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen **Last Updated:** Fri Mar 17, 2017 09:24 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from higher ranked SU when siranked su is not configured. Steps to reproduce: 1) Bring attached application up on one controller. 2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref active assignments for SI is 3. 3) Reduce pref active assignment for the SI by running following command: immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active 4)Since pref active assignments is reduced by 1, AMFD sends quiesced and removal of assignment to SU2. 5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3. Assignments before reducing pref active assignmets: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Assignments after reducing pre active assignments: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.
--- ** [tickets:#2394] clm: add clm tool commands for admin op and state check.** **Status:** accepted **Milestone:** next **Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen **Last Updated:** Thu Mar 23, 2017 06:17 AM UTC **Owner:** Praveen Intention is to add clm tool comamnds: -to perform admin operation on node or on cluster. Something like clm-adm <lock|shutdown|unlock|reset> -to check CLM nodes admin state and member ship status: like clm-state <membership|adminstate> -to find CLM cluster and nodes like: clm-find <memebers|non-member> --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2387 amf: choose CLM unlocked spare controller for standby role in failover situation
- **status**: review --> fixed - **Comment**: https://sourceforge.net/p/opensaf/mailman/message/35738800/ changeset: 8712:a3ba6212ecf6 branch: opensaf-5.1.x parent: 8707:4e47c66382f3 user: Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 11:34:31 2017 +0530 summary: amfd: choose CLM unlocked spare controller for standby role in failover situation[#2387] changeset: 8713:3a718e40acec branch: opensaf-5.0.x parent: 8708:9073359c83b4 user: Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 11:35:07 2017 +0530 summary: amfd: choose CLM unlocked spare controller for standby role in failover situation[#2387] changeset: 8714:ffb6233abe8b tag: tip parent: 8711:262d1f2132ca user:Praveen Malviya <praveen.malv...@oracle.com> date:Thu Mar 23 11:36:00 2017 +0530 summary: amfd: choose CLM unlocked spare controller for standby role in failover situation[#2387] --- ** [tickets:#2387] amf: choose CLM unlocked spare controller for standby role in failover situation** **Status:** fixed **Milestone:** 5.0.2 **Created:** Fri Mar 17, 2017 12:13 PM UTC by Ritu Raj **Last Updated:** Tue Mar 21, 2017 09:52 AM UTC **Owner:** Praveen **Attachments:** - [SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-1.tar.bz2) (873.4 kB; application/x-bzip) - [SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-2.tar.bz2) (762.0 kB; application/x-bzip) - [SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-3.tar.bz2) (724.5 kB; application/x-bzip) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 6 nodes setup(3 controller and 3 payload, with SC_ABSENCE enabled) ###Summary choose CLM unlocked spare controller for standby role in failover situation ###Steps followed & Observed behaviour 1. Initially SC-1 (ACTIVE), SC-2 (QUIESCED) , SC-3 (STANDBY) role 2. Performed clm_lock operation on SC-2(QUIESCED) controller 3. after, that perfomed on failover on Active controller (SC-1), by killing one director 4. Observed that SC-3 got Active role while SC-2 got Standby role, which is not expcted as node SC-2 is in clm_locked state 5. Later, SC-1 joined as QUIESCED controller (after recovery from failover) **Expected**: clm_lock node should not get standby role as it is in locked state and SC-1 should join as a Standby after recovery from failover. Syslog: Mar 17 17:56:59 suseR2-S2 osafimmnd[21809]: NO Implementer (applier) connected: 28 (@safSmf_applier1) <0, 2030f> Mar 17 17:56:59 suseR2-S2 osafamfnd[21859]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO RDE role set to STANDBY Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Peer up on node 0x2030f Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info request from node 0x2030f with role ACTIVE Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info response from node 0x2030f with role ACTIVE Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:3, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:5, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:5, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 (change:3, dest:566317113647120) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 (change:3, dest:565213543063568) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN AMF HA STANDBY request Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 566317113647120 Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 565213543063568 Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, rc=SA_AIS_ERR_UNAVAILABLE (31) Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, rc=SA_AIS_ERR_UNAVAILABLE (31) Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed From Traces: SC-2 left the cluster as clm lock operation performed and later SC-1 left the cluster as one failover performed: ~~~ SC-2::: Mar 17 17:54:24.123134 osafamfnd [6773:src/amf/amfnd/clm.cc:0196] >> clm_track_cb: '0' '4' '1' Mar 17 17:54:24.123142 osafamfnd [6773:src/amf/amfnd/clm.cc:0217] TR Node has left the cluster 'safNode=SC-2,safCluster=myClmCluster', avnd_cb->first_time_up 0,notifItem->clusterNode.nodeId 131599, avnd_cb->node_info.nodeId 131343 - - SC-1::: Mar 17 17:57:03.514477 osafamfnd [9266:src/amf/amfnd/clm.cc:0196] >> clm_track_cb: '0' '4' '1' Mar 17 17:57:03.514484 osafamfnd [9266:src/amf/amfnd/clm.cc:0217] TR Node has left the cluster 'safNode=SC-1,safCluster=myClmCluster', avnd_cb->first_t
[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using
- **status**: assigned --> unassigned - **Type**: defect --> enhancement - **Milestone**: 5.0.2 --> next - **Comment**: As per CLM PR doc, section "3.2.2 Compliance Report", saClmNodeCurrAddressFamily and saClmNodeCurrAddress are not supported. So converting this ticket into enhancement. I may plan it for next release. --- ** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using** **Status:** unassigned **Milestone:** next **Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh **Last Updated:** Tue Mar 07, 2017 11:54 AM UTC **Owner:** Praveen saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not exposed to IMM even that TCP mode is configured. This kind of information is sometimes needed by application. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2387 clm_locked spare controller got standby role after failover
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> d - **Milestone**: 5.2.RC2 --> 5.0.2 --- ** [tickets:#2387] clm_locked spare controller got standby role after failover** **Status:** assigned **Milestone:** 5.0.2 **Created:** Fri Mar 17, 2017 12:13 PM UTC by Ritu Raj **Last Updated:** Fri Mar 17, 2017 12:13 PM UTC **Owner:** Praveen **Attachments:** - [SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-1.tar.bz2) (873.4 kB; application/x-bzip) - [SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-2.tar.bz2) (762.0 kB; application/x-bzip) - [SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-3.tar.bz2) (724.5 kB; application/x-bzip) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 6 nodes setup(3 controller and 3 payload, with SC_ABSENCE enabled) ###Summary clm_locked spare controller got standby role after failover ###Steps followed & Observed behaviour 1. Initially SC-1 (ACTIVE), SC-2 (QUIESCED) , SC-3 (STANDBY) role 2. Performed clm_lock operation on SC-2(QUIESCED) controller 3. after, that perfomed on failover on Active controller (SC-1), by killing one director 4. Observed that SC-3 got Active role while SC-2 got Standby role, which is not expcted as node SC-2 is in clm_locked state 5. Later, SC-1 joined as QUIESCED controller (after recovery from failover) **Expected**: clm_lock node should not get standby role as it is in locked state and SC-1 should join as a Standby after recovery from failover. Syslog: Mar 17 17:56:59 suseR2-S2 osafimmnd[21809]: NO Implementer (applier) connected: 28 (@safSmf_applier1) <0, 2030f> Mar 17 17:56:59 suseR2-S2 osafamfnd[21859]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO RDE role set to STANDBY Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Peer up on node 0x2030f Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info request from node 0x2030f with role ACTIVE Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info response from node 0x2030f with role ACTIVE Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:3, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:5, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 (change:5, dest:13) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 (change:3, dest:566317113647120) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 (change:3, dest:565213543063568) Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN AMF HA STANDBY request Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 566317113647120 Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 565213543063568 Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, rc=SA_AIS_ERR_UNAVAILABLE (31) Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, rc=SA_AIS_ERR_UNAVAILABLE (31) Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed From Traces: SC-2 left the cluster as clm lock operation performed and later SC-1 left the cluster as one failover performed: ~~~ SC-2::: Mar 17 17:54:24.123134 osafamfnd [6773:src/amf/amfnd/clm.cc:0196] >> clm_track_cb: '0' '4' '1' Mar 17 17:54:24.123142 osafamfnd [6773:src/amf/amfnd/clm.cc:0217] TR Node has left the cluster 'safNode=SC-2,safCluster=myClmCluster', avnd_cb->first_time_up 0,notifItem->clusterNode.nodeId 131599, avnd_cb->node_info.nodeId 131343 - - SC-1::: Mar 17 17:57:03.514477 osafamfnd [9266:src/amf/amfnd/clm.cc:0196] >> clm_track_cb: '0' '4' '1' Mar 17 17:57:03.514484 osafamfnd [9266:src/amf/amfnd/clm.cc:0217] TR Node has left the cluster 'safNode=SC-1,safCluster=myClmCluster', avnd_cb->first_time_up 0,notifItem->clusterNode.nodeId 131343, avnd_cb->node_info.nodeId 131855 ~~~ after failover SC-2 got standby role and SC-3 Active : ~~~ SC::2 Mar 17 17:56:59.941081 osafamfnd [21859:src/amf/amfnd/susm.cc:1043] NO Assigned 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Mar 17 17:56:59.941089 osafamfnd [21859:src/amf/amfnd/err.cc:1639] >> is_no_assignment_due_to_escalations Mar 17 17:56:59.941097 osafamfnd [21859:src/amf/amfnd/err.cc:1651] << is_no_assignment_due_to_escalations: false Mar 17 17:56:59.941104 osafamfnd [21859:src/amf/amfnd/di.cc:0829] >> avnd_di_susi_resp_send: Sending Resp su=safSu=SC-2,safSg=2N,safApp=OpenSAF, si=safSi=SC-2N,safApp=OpenSAF, curr_state=2, prv_state=0 Mar 17 17:56:59.941112 osafamfnd [21859:src/amf/amfnd/di.cc:0839] TR curr_assign_state '3 SC:::3 Mar 17 17:57:03.656105 osafamfnd [9266:src/amf/a
[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.
- **status**: assigned --> review - **Milestone**: 5.0.2 --> 5.2.RC2 --- ** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way Active model.** **Status:** review **Milestone:** 5.2.RC2 **Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen **Last Updated:** Wed Mar 08, 2017 06:24 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from higher ranked SU when siranked su is not configured. Steps to reproduce: 1) Bring attached application up on one controller. 2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref active assignments for SI is 3. 3) Reduce pref active assignment for the SI by running following command: immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active 4)Since pref active assignments is reduced by 1, AMFD sends quiesced and removal of assignment to SU2. 5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3. Assignments before reducing pref active assignmets: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Assignments after reducing pre active assignments: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster
SI dep is configured among the SIs assigned in same SU. This ticket must be duplicate of \#92 AVSv: In NPM per SI level role failover needs to be implemented when SI-SI dependency within SU is configured. Analysis: 1) NG was locked and AMF sent quiesced assignment to the SU : Mar 17 4:37:51.067937 osafamfd [2562:src/amf/amfd/nodegroup.cc:1072] >> ng_admin_op_cb: 'safAmfNodeGroup=smfLockAdmNg13,safAmfCluster=myAmfCluster', inv:'936302870542', op:'2' Mar 17 4:37:51.070545 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:4454] >> ng_admin: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP', sg_fsm_state:0 Mar 17 4:37:51.070553 osafamfd [2562:src/amf/amfd/sgproc.cc:2319] >> avd_sg_su_si_mod_snd: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP', state 3 Mar 17 4:37:51.070560 osafamfd 2)When response for quiesced state comes, AMFD tries to failover the SU and could not failover it as both sponsor and dependent are in same SU , so it sends deletion of assignment to the SU : Mar 17 4:37:51.229455 osafamfd [2562:src/amf/amfd/sgproc.cc:1104] >> avd_su_si_assign_evh: id:101, node:2010f, act:5, 'safSu=SU1,safSg=SGONE,safApp=NPMAPP', '', ha:3, err:1, single:0 Mar 17 4:37:51.230494 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0162] >> avd_sg_npm_su_chk_snd Mar 17 4:37:51.230507 osafamfd [2562:src/amf/amfd/si_dep.cc:1730] >> avd_sidep_is_su_failover_possible: SU:'safSu=SU1,safSg=SGONE,safApp=NPMAPP' node_state:2 Mar 17 4:37:51.230515 osafamfd [2562:src/amf/amfd/si_dep.cc:1734] TR :susi:safSi=NPMSI2,safApp=NPMAPP si_dep_state:3 state:3 fsm:3 Mar 17 4:37:51.230522 osafamfd [2562:src/amf/amfd/si_dep.cc:1573] >> avd_sidep_is_si_failover_possible: SI: 'safSi=NPMSI2,safApp=NPMAPP', SU safSu=SU1,safSg=SGONE,safApp=NPMAPP Mar 17 4:37:51.230530 osafamfd [2562:src/amf/amfd/si_dep.cc:1712] << avd_sidep_is_si_failover_possible: return value: 0 Mar 17 4:37:51.230536 osafamfd [2562:src/amf/amfd/si_dep.cc:1745] TR Role failover is deferred as sponsors role failover is under going Mar 17 4:37:51.230543 osafamfd [2562:src/amf/amfd/si_dep.cc:0205] TR 'safSi=NPMSI2,safApp=NPMAPP' si_dep_state ASSIGNED => FAILOVER_UNDER_PROGRESS Mar 17 4:37:51.230588 osafamfd [2562:src/amf/amfd/chkop.cc:0229] TR Async update Mar 17 4:37:51.230757 osafamfd [2562:src/amf/amfd/si_dep.cc:1752] << avd_sidep_is_su_failover_possible: return value: 0 Mar 17 4:37:51.230764 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0169] TR role modification cannot be done now as Sponsor SI's are not yet assigned Mar 17 4:37:51.230771 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0208] << avd_sg_npm_su_chk_snd: return value :2 Mar 17 4:37:51.230778 osafamfd [2562:src/amf/amfd/sgproc.cc:2434] >> avd_sg_su_si_del_snd: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP' Mar 17 4:37:51.230795 osafamfd [2562:src/amf/amfd/su.cc:2462] >> any_susi_fsm_in: SU:'safSu=SU1,safSg=SGONE,safApp=NPMAPP', check_fsm:1 Mar 17 4:37:51.230803 osafamfd [2562:src/amf/amfd/su.cc:2467] TR SUSI:'safSu=SU1,safSg=SGONE,safApp=NPMAPP,safSi=NPMSI1,safApp=NPMAPP', fsm:'3' Mar 17 4:37:51.230809 osafamfd [2562:src/amf/amfd/su.cc:2467] TR SUSI:'safSu=SU1,safSg=SGONE,safApp=NPMAPP,safSi=NPMSI2,safApp=NPMAPP', fsm:'3' 3)After deletion of assignment, AMF again tries to failover the assignments but fails for the same reason as above. --- ** [tickets:#2371] AMF: NPM app went into unstable state while expanding cluster** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava **Last Updated:** Wed Mar 15, 2017 05:37 AM UTC **Owner:** Praveen **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) (86.9 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) (13.0 MB; application/octet-stream) Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) 4 node cluster without PBE Summary - Application went into unstable state and campaign execution could not complete while expanding the cluster using campaign Steps: 1. Brought up an NPM application with 5 SUs 2. Using campaign add a 3rd payload PL-5 to the cluster App went into bad state Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:27 NewSC1 osafamfd
[tickets] [opensaf:tickets] #2369 Java: consolidated clm java issue
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> lib - **Milestone**: 5.2.RC2 --> 5.0.2 --- ** [tickets:#2369] Java: consolidated clm java issue** **Status:** assigned **Milestone:** 5.0.2 **Created:** Mon Mar 13, 2017 10:46 AM UTC by Ritu Raj **Last Updated:** Mon Mar 13, 2017 10:46 AM UTC **Owner:** Praveen ###Environment details OS : Suse 64bit Changeset : 8634 ( 5.2.FC) 4 nodes setup(2 controller and 2 payload) ###Summary consolidated clm java issue ###Steps followed & Observed behaviour JAVA_CLM issues: (A). 1. Call clm Initialization 2. Call DispatchBlocking in one thread 2. Invoked Finalize in main thread observed that dispatch thread failed to exit 3. thread should not wait, once finalized and the dispatch thread created should exited (B). 1. Call clm Initialize > Finalize 2. Call getClusterMembershipManager with already Finalize handle, it should return bad handle exception, but proper handle is returned. (C). 1. Initialize version ['B', 1, 0] - Minor version less than supported minor version 2. It is returning Incompatible version parameter, instead of expected SA_AIS_OK. (D). 1. Initialize version ['B', 1, 8] - Minor version greater than supported minor version 2. It is returning Incompatible version parameter, instead of expected SA_AIS_OK. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> d - **Milestone**: 5.2.RC2 --> 5.0.2 --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** assigned **Milestone:** 5.0.2 **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Thu Mar 16, 2017 08:33 AM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_BAD_OPERATION' SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> clms_admin_state_update_rattr: Admin state 1 update for node safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster ~~~ .. .. *but Sending track callback failed for SA_CLM_CHANGE_COMPLETED* ~~~ Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << clms_prep_and_send_track Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending track callback failed for SA_CLM_CHANGE_COMPLETED Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track ~~~ -- and later performed admin operation got failed as 'Another Admin operation already in progress' ~~~ Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:2 Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another Admin operation already in progress: 4 ~~~ Notes: 1. Syslog of Active controller attached 2. osafclmd of Active controller attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a
[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> d - **Milestone**: 5.2.RC2 --> 5.0.2 --- ** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting node** **Status:** assigned **Milestone:** 5.0.2 **Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj **Last Updated:** Thu Mar 16, 2017 07:30 AM UTC **Owner:** Praveen **Attachments:** - [active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz) (1.3 MB; application/x-compressed-tar) - [messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) (1.9 MB; application/octet-stream) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 4 nodes setup(2 controller and 2 payload) ###Summary clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node ###Steps followed & Observed behaviour 1. Initially performed clm_lock operation on Payload (PL-3) and immediately restarted the same payload(PL-3) > init 6; exit 2. Later, performed clm_unlock operation on PL-3, and got message unlock operation got timed out but still node joined the cluster > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: > msg_id 1 > Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster > Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 > (MsgQueueService131855) <0, 2030f> > error - command timed out (alarm) 3. After, that if clm_lock or unlock opeartion performed it returns 'SA_AIS_ERR_NOT_SUPPORTED' and 'SA_AIS_ERR_BAD_OPERATION' > SLES-SLOT1:~ # amf-adm lock safNode=SC-1,safCluster=myClmCluster > Mar 15 14:50:47 SLES-SLOT1 osafclmd[2763]: NO Lock on active node not allowed > Mar 15 14:50:47 SLES-SLOT1 osafclmd[2763]: NO clms_imm_node_lock failed > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_NOT_SUPPORTED (19) > > SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster > error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: > SA_AIS_ERR_BAD_OPERATION (20) Traces: >From the traces: Node PL-3 joined the cluster ~~~ Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:1 Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> clms_admin_state_update_rattr: Admin state 1 update for node safNode=PL-3,safCluster=myClmCluster Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster ~~~ .. .. *but Sending track callback failed for SA_CLM_CHANGE_COMPLETED* ~~~ Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback msg send to clma failed Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << clms_prep_and_send_track Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending track callback failed for SA_CLM_CHANGE_COMPLETED Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> clms_prep_and_send_track ~~~ -- and later performed admin operation got failed as 'Another Admin operation already in progress' ~~~ Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> clms_imm_admin_op_callback: Admin callback for nodename:safNode=PL-3,safCluster=myClmCluster, opId:2 Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36 Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << clms_node_get_by_name Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another Admin operation already in progress: 4 ~~~ Notes: 1. Syslog of Active controller attached 2. osafclmd of Activ
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
Hi, Srikanth: Thanks for the information. I have analyzed the situation. The two issues are same (one case AMF application comps are running on locked payloads). The message " NO Pending Response sent for CLM track callback::OK '7'" is because of AMF responding two times for same invocationid. For the case mentioned in ticket description this message is not observed because applications installed on locked nodes makes the difference. CLMS properly maintains invocationid for all clients per callback. So to understand the problem I considered a diferent case. Suppose one payload node PL-4 is locked and an application still has not responded for the track callbacks and another payload PL-3 is stopped (OpenSAF stop). Application is hosted on PL-5 and its track flags are same as AMFD: (SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY | SA_TRACK_VALIDATE_STEP | SA_TRACK_START_STEP). In this case what is observed is when PL-4 is locked both AMF and app gets track callback for CHANGE_START.Here AMF responds for the callback but application does not respond. Now PL-3 is stopped. Here CLM delievers track callback for COMPLETED step but it contains numberOfItems=2 both payload PL-3 and PL-4. Even application the same. Application never responds for the PL-4 callback and node lock timer expires at CLMD and it again sends completed callback to both AMFD and application. Since both AMFD and application has registered for SA_TRACK_CHANGES_ONLY,I really doubt CLM should send callback for both PL-3 and PL-4. In the description of ticket I have pointed out this problem for CHANGE_START case. In CLM spec in section 3.5.2 SaClmClusterTrackCallbackT_4 page 51: The value of the numberOfItems attribute in the structure to which the notificationBuffer parameter points might be greater than the value of the numberOfMembers parameter if either the SA_TRACK_CHANGES flag or the SA_TRACK_CHANGES_ONLY flags is set, and one or more member nodes have left the cluster membership. In this case, the structure to which the notificationBuffer parameter points might contain information about the current members of the cluster and also about nodes that have recently left the cluster membership. I am going though ticket list and spec for more information regarding this. Thanks, Praveen Attachments: - [node_lock_and_stop.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/node_lock_and_stop.tgz) (382.7 kB; application/x-compressed) - [two_nodes_lock.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/two_nodes_lock.tgz) (335.0 kB; application/x-compressed) --- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** accepted **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Wed Mar 15, 2017 06:27 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/ope
[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster
- **status**: unassigned --> assigned - **assigned_to**: Praveen - **Part**: - --> d --- ** [tickets:#2371] AMF: NPM app went into unstable state while expanding cluster** **Status:** assigned **Milestone:** 5.2.RC2 **Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava **Last Updated:** Tue Mar 14, 2017 08:03 AM UTC **Owner:** Praveen **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) (86.9 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) (13.0 MB; application/octet-stream) Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) 4 node cluster without PBE Summary - Application went into unstable state and campaign execution could not complete while expanding the cluster using campaign Steps: 1. Brought up an NPM application with 5 SUs 2. Using campaign add a 3rd payload PL-5 to the cluster App went into bad state Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:27 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state Mar 17 04:38:29 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in unstable/transition state --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
--- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** accepted **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Tue Mar 14, 2017 09:29 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2365 AMF: Active controller went for continuous reboots when an NPM app is upgraded with more SIs and CSIs
- **status**: unassigned --> assigned - **assigned_to**: Praveen --- ** [tickets:#2365] AMF: Active controller went for continuous reboots when an NPM app is upgraded with more SIs and CSIs** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Sat Mar 11, 2017 03:40 PM UTC by Chani Srivastava **Last Updated:** Sat Mar 11, 2017 03:40 PM UTC **Owner:** Praveen **Environment details** OS : Suse 64bit Changeset : 8634 ( 5.2.FC) Setup : 4 nodes ( 2 controllers and 2 payloads / no PBE ) **Steps followed & Observed behaviour** 1. Import attached xml 2. Bring up the attached NPM.sh application 3. Execute attached campaign22.xml to upgrade the application Campaign22.xml adds more SIs and CSIs ( i.e work ) and assign it to SUs which can handle more work and also assign to spare SUs Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Restarting a component of 'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (comp restart count: 1) Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 'avaDown' : Recovery is 'componentRestart' Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Restarting a component of 'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (comp restart count: 2) Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 'safComp=COMP2SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 'avaDown' : Recovery is 'componentRestart' | Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Performing failover of 'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (SU failover count: 1) Oct 2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 'safComp=COMP1SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' recovery action escalated from 'componentRestart' to 'suFailover' | Oct 2 18:03:47 OSAF-SC1 osafamfnd[6292]: NO 'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' recovery action escalated from 'componentRestart' to 'nodeFailover' Oct 2 18:03:47 OSAF-SC1 osafamfnd[6292]: NO 'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 'avaDown' : Recovery is 'nodeFailover' | Oct 2 18:03:49 OSAF-SC1 osafamfnd[6292]: NO Received reboot order, ordering reboot now! Oct 2 18:03:49 OSAF-SC1 osafamfnd[6292]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Received reboot order, OwnNodeId = 131343, SupervisionTime = 60 Oct 2 18:03:49 OSAF-SC1 opensaf_reboot: Rebooting local node; timeout=60 I will share the logs and scripts offline as they are hude in size --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.
- **status**: assigned --> review --- ** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.** **Status:** review **Milestone:** 5.0.2 **Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen **Last Updated:** Wed Jan 18, 2017 06:08 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in N-Way Active model. Issue can be reproduced by brining up the attached configurration. In the application saAmfSGNumPrefAssignedSUs is set to 2: immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass saAmfSGNumPrefAssignedSUs SA_UINT32_T 2 (0x2) But AMF is giving assignmets to all the three SUs: safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) Since this attribute is valid for N-Way model also, issue is applicable to N-Way model also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1190 AMF: saAmfSIPrefActiveAssignments has wrong default, stopping scaling nway active SGs
Hi All, I think patch for this ticket can be pushed in other branches also because we are retaining both the definitions. If a user sets it to 1 then default value will remain 1. If a user sets it to 0 then default value will be PrefAssignedSUs. A user has always the facility to lock the SI for no assignments. Thanks, Praveen --- ** [tickets:#1190] AMF: saAmfSIPrefActiveAssignments has wrong default, stopping scaling nway active SGs** **Status:** fixed **Milestone:** 5.2.FC **Created:** Thu Oct 23, 2014 01:10 PM UTC by Hans Feldt **Last Updated:** Fri Feb 24, 2017 06:15 AM UTC **Owner:** Praveen Problem: In naway-active, SUs are not instantiated unless saAmfSIPrefActiveAssignments is configured. saAmfSIPrefActiveAssignments is a configuration attribute only valid for the nway-active redundancy model. According to the spec 3.6.5.3 it should have a default value of "the preferred number of assigned service units." and "saAmfSGNumPrefAssignedSUs" should have a default value of "the preferred number of in-service service units" and "saAmfSGNumPrefInserviceSUs" should have a default value of "the number of the service units configured for the service group." The value of saAmfSIPrefActiveAssignments is currently set to one when not configured, instead it should be set to saAmfSGNumPrefAssignedSUs. In order to avoid any backward compatibility issue, choice is left to the user for default value of the attribute. Default value of saAmfSIPrefActiveAssignments will be either saAmfSGNumPrefAssignedSUs or 1 based on user choice. Following are conditions in which different default values will be honoured: -if a user configures saAmfSIPrefActiveAssignments=1 then SI will assigned to only one SU.This is to ensure backward compatibility. -if a user does not configure attribute saAmfSIPrefActiveAssignments in application or deletes this attributes via CCB operation then AMFD will still honor default value as 1. This is again to ensure backward compatibility. -if a user sets saAmfSIPrefActiveAssignments=0 via CCB or in application conf then AMFD will use section 3.6.5 definition for default value i.e saAmfSGNumPrefAssignedSUs. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2325 clm: standby clmd crashed after failing to read node configuration from IMM.
- **status**: review --> fixed - **Comment**: changeset: 8682:50a2033a8a8d branch: opensaf-5.0.x parent: 8679:7ec6c15c249f user: Praveen Malviya <praveen.malv...@oracle.com> date:Fri Mar 10 10:48:17 2017 +0530 summary: clmd: try to re-read node config from IMM if BAD_HANDLE is returned [#2325]. changeset: 8683:59e265654232 branch: opensaf-5.1.x parent: 8680:e02390320bbb user: Praveen Malviya <praveen.malv...@oracle.com> date:Fri Mar 10 10:49:06 2017 +0530 summary: clmd: try to re-read node config from IMM if BAD_HANDLE is returned [#2325]. changeset: 8684:9338ad3cacc0 tag: tip parent: 8681:0e9c5da42416 user:Praveen Malviya <praveen.malv...@oracle.com> date:Fri Mar 10 10:49:44 2017 +0530 summary: clmd: try to re-read node config from IMM if BAD_HANDLE is returned [#2325]. --- ** [tickets:#2325] clm: standby clmd crashed after failing to read node configuration from IMM.** **Status:** fixed **Milestone:** 5.0.2 **Created:** Fri Feb 24, 2017 09:32 AM UTC by Praveen **Last Updated:** Fri Mar 03, 2017 10:40 AM UTC **Owner:** Praveen Issue is not reproducible. While coming up as standby, CLMD successfully initializes with IMM. It successfuly reads cluster related configuration. While reading node related configuration from IMM, CLMD make a calls to saImmOmSearchNext_2(). This API could not send any message to IMMND and failed: Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA OpenSAF imm lib: Message loss detected for dest 565213425675031 service id:25 Feb 15 06:32:17 SC-2-2 osafimmnd[3930]: WA IMMND - Client Node Get Failed for cli_hdl:932008034831 Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA OpenSAF imm lib: Message loss detected for dest 565213425675031 service id:25 Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA marking handle as exposed CLMD does not explicitly check whether node config read was sucessful or not. It comes and completes the cold sync. When a payload joins the cluster, active CLMD checkpoints run time data for the node. Since node is not present on standby CLMD, it crashes: Feb 15 06:33:26 SC-2-2 osafimmd[3915]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 22 new epoch:23 Feb 15 06:33:26 SC-2-2 osafclmd[3972]: ER Node is NULL,problem with the database. Feb 15 06:33:26 SC-2-2 osafclmd[3972]: ../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' failed. Feb 15 06:33:27 SC-2-2 osafamfnd[4002]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2354 osaf: How to detect if payload is being in "SC Absence" mode.
--- ** [tickets:#2354] osaf: How to detect if payload is being in "SC Absence" mode.** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen **Last Updated:** Wed Mar 08, 2017 07:28 AM UTC **Owner:** nobody This discussion ticket is being raised based on a user list query dated March 1st, 2017. The query says: "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, it works good so far. Now we need to make some actions when PLD go in/out "SC Absence" mode, we have to find a way in PLD to detect if it is being in "SC Absent" mode or not. So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF API) as well? " I think we do not have any API which can be used to query OpenSAF for knowing SC absence state. MDS up and down events of directors can be used to decide SC absence state as some agents are and node directors are using. But this will add lot of code in application. Please update this ticket for a known or proposed solution. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.
Similar issue in N-Way model also when SiPrefStandbyAssignment is reduced. Also AMFD is not checking the HA state of the susi and tries to delete active susi and crashes: Mar 8 11:38:25 SC-1 osafimmnd[4765]: NO Ccb 4 COMMITTED (immcfg_SC-1_5673) Mar 8 11:38:25 SC-1 amf_demo[5464]: >CSI Remove=> Mar 8 11:38:25 SC-1 amf_demo[5464]: Comp>:'safComp=NWay,safSu=SU1,safSg=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5464]: CSI-->:'safCsi=NWay,safSi=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5464]: CSI FLAG-->: SA_AMF_CSI_TARGET_ONE Mar 8 11:38:25 SC-1 amf_demo[5464]: <=== Mar 8 11:38:25 SC-1 osafamfnd[4817]: NO Removed 'safSi=NWay,safApp=NWay' from 'safSu=SU1,safSg=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5464]: saAmfResponse after lopp- 1 Mar 8 11:38:25 SC-1 amf_demo[5494]: >CSI Remove=> Mar 8 11:38:25 SC-1 amf_demo[5494]: Comp>:'safComp=NWay,safSu=SU2,safSg=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5494]: CSI-->:'safCsi=NWay,safSi=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5494]: CSI FLAG-->: SA_AMF_CSI_TARGET_ONE Mar 8 11:38:25 SC-1 amf_demo[5494]: <=== Mar 8 11:38:25 SC-1 osafamfnd[4817]: NO Removed 'safSi=NWay,safApp=NWay' from 'safSu=SU2,safSg=NWay,safApp=NWay' Mar 8 11:38:25 SC-1 amf_demo[5494]: saAmfResponse after lopp- 1 Mar 8 11:38:25 SC-1 osafamfd[4803]: src/amf/amfd/su.cc:2072: dec_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs > 0' failed. Mar 8 11:38:25 SC-1 osafamfnd[4817]: ER AMFD has unexpectedly crashed. Rebooting node Mar 8 11:38:25 SC-1 osafamfnd[4817]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 131343, SupervisionTime = 60 Mar 8 11:38:25 SC-1 osafimmnd[4765]: NO Implementer locally disconnected. Marking it as doomed 31 <23, 2010f> (safAmfService) Mar 8 11:38:25 SC-1 osafimmnd[4765]: NO Implementer disconnected 31 <23, 2010f> (safAmfService) Mar 8 11:38:25 SC-1 opensaf_reboot: Rebooting local node; timeout=60 Attachments: - [AppConfig-nway_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d2d02dfe/79b6/attachment/AppConfig-nway_3SUs_1SIs.xml) (11.9 kB; text/xml) --- ** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way Active model.** **Status:** assigned **Milestone:** 5.0.2 **Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen **Last Updated:** Wed Jan 18, 2017 05:43 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml) (13.7 kB; text/xml) When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from higher ranked SU when siranked su is not configured. Steps to reproduce: 1) Bring attached application up on one controller. 2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref active assignments for SI is 3. 3) Reduce pref active assignment for the SI by running following command: immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active 4)Since pref active assignments is reduced by 1, AMFD sends quiesced and removal of assignment to SU2. 5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3. Assignments before reducing pref active assignmets: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Assignments after reducing pre active assignments: safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to acce
[tickets] [opensaf:tickets] #2334 clm: Fix all Cppcheck 1.77 issue
- **Milestone**: next --> never --- ** [tickets:#2334] clm: Fix all Cppcheck 1.77 issue ** **Status:** wontfix **Milestone:** never **Created:** Fri Mar 03, 2017 04:09 AM UTC by A V Mahesh (AVM) **Last Updated:** Tue Mar 07, 2017 06:04 AM UTC **Owner:** A V Mahesh (AVM) [staging/src/clm/clmd/clms_evt.c:602] -> [staging/src/clm/clmd/clms_evt.c:545]: (warning) Either the condition 'node!=NULL' is redundant or there is possible null pointer dereference: node. [staging/src/clm/clmd/clms_evt.c:603] -> [staging/src/clm/clmd/clms_evt.c:545]: (warning) Either the condition 'node!=NULL' is redundant or there is possible null pointer dereference: node. [staging/src/clm/clmd/clms_evt.c:618]: (warning) Possible null pointer dereference: ip [staging/src/clm/clmd/clms_evt.c:101] -> [staging/src/clm/clmd/clms_evt.c:104]: (style) Variable 'clma_down_rec' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:177] -> [staging/src/clm/clmd/clms_evt.c:182]: (style) Variable 'client' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:188] -> [staging/src/clm/clmd/clms_evt.c:191]: (style) Variable 'rc' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:504] -> [staging/src/clm/clmd/clms_evt.c:521]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:678] -> [staging/src/clm/clmd/clms_evt.c:683]: (style) Variable 'node_name' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:679] -> [staging/src/clm/clmd/clms_evt.c:684]: (style) Variable 'op_node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:677] -> [staging/src/clm/clmd/clms_evt.c:687]: (style) Variable 'rc' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:873] -> [staging/src/clm/clmd/clms_evt.c:877]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:1039] -> [staging/src/clm/clmd/clms_evt.c:1048]: (style) Variable 'op_node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:1134] -> [staging/src/clm/clmd/clms_evt.c:1142]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:1236] -> [staging/src/clm/clmd/clms_evt.c:1240]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:2028] -> [staging/src/clm/clmd/clms_evt.c:2036]: (style) Variable 'mds_rc' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_evt.c:2046] -> [staging/src/clm/clmd/clms_evt.c:2052]: (style) Variable 'node_down_rec' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:178] -> [staging/src/clm/clmd/clms_imm.c:184]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:275] -> [staging/src/clm/clmd/clms_imm.c:295]: (style) Variable 'rc' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:335] -> [staging/src/clm/clmd/clms_imm.c:352]: (style) Variable 'rc' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:690] -> [staging/src/clm/clmd/clms_imm.c:693]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:993] -> [staging/src/clm/clmd/clms_imm.c:996]: (style) Variable 'trk' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:1721] -> [staging/src/clm/clmd/clms_imm.c:1726]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:1998] -> [staging/src/clm/clmd/clms_imm.c:2004]: (style) Variable 'node' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_imm.c:1527]: (style) The scope of the variable 'i' can be reduced. [staging/src/clm/clmd/clms_imm.c:1528]: (style) The scope of the variable 'attrMod' can be reduced. [staging/src/clm/clmd/clms_imm.c:1529]: (style) The scope of the variable 'name' can be reduced. [staging/src/clm/clmd/clms_imm.c:557]: (style) Variable 'attr_Mod' is assigned a value that is never used. [staging/src/clm/clmd/clms_imm.c:649]: (style) Variable 'attr_Mod' is assigned a value that is never used. [staging/src/clm/clmd/clms_imm.c:829]: (style) Variable 'attr_Mod' is assigned a value that is never used. [staging/src/clm/clmd/clms_main.c:329]: (style) Suspicious condition (assignment + comparison); Clarify expression with parentheses. [staging/src/clm/clmd/clms_main.c:87] -> [staging/src/clm/clmd/clms_main.c:91]: (style) Variable 'evt' is reassigned a value before the old one has been used. [staging/src/clm/clmd/clms_main.c:147]: (warning) fscanf() without