[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless
Hi Minh, In this patch 1620_12_amfd_adjust_ongoing_susi.diff, I am seeing the some code related to the ticket #1752. Patch for #1752 was floated same day when this ticket was updated. Please check if this updated patch is the correct one. Thanks, Praveen --- ** [tickets:#1725] AMF: Recover transient SUSIs left over from headless** **Status:** accepted **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau **Last Updated:** Wed Apr 27, 2016 06:58 AM UTC **Owner:** Minh Hon Chau This ticket is more likely an enhancement that targets on how AMFD detect and recover the transients SUSI left over from headless. There are three major situations: (1) - Cluster goes headless, su/node failover on any payloads can happen, then cluster recover (2) - issue admin op on any AMF entities, cluster goes headless. During headless, the middle HA assignments of whole admin op sequence between AMFND and components could be: (2.1) The assignment completes, component returns OK with csi callback, then cluster recover (2.2) The assignment is under going, then cluster recover. The assignment afterward could complete, or csi callback returns FAILED_OPERATION or error can also happen At the time cluster recover, amfd has collected all assignments from all amfnd(s). These assignments can be in assigned or assigning states whilst its HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen in a combination, which means while issuing admin op (2), cluster go headless and any kinds of failover (1) can happen during headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1807 log: global immutilWrapperProfile is not thread safe
- **status**: unassigned --> accepted - **assigned_to**: Vu Minh Nguyen --- ** [tickets:#1807] log: global immutilWrapperProfile is not thread safe** **Status:** accepted **Milestone:** 5.0.1 **Created:** Thu May 05, 2016 03:57 AM UTC by Vu Minh Nguyen **Last Updated:** Thu May 05, 2016 03:57 AM UTC **Owner:** Vu Minh Nguyen There are some threads running which use the common global `immutilWrapperProfile` variable. 1) imm_impl_init_thread 2) applier_thread 3) main thread so, using `immutil` API is not thread safe anymore to log service. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1807 log: global immutilWrapperProfile is not thread safe
--- ** [tickets:#1807] log: global immutilWrapperProfile is not thread safe** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Thu May 05, 2016 03:57 AM UTC by Vu Minh Nguyen **Last Updated:** Thu May 05, 2016 03:57 AM UTC **Owner:** nobody There are some threads running which use the common global `immutilWrapperProfile` variable. 1) imm_impl_init_thread 2) applier_thread 3) main thread so, using `immutil` API is not thread safe anymore to log service. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1719 AMF: ProtectionGroup info lost after headless
- **Comment**: changeset: 7567:2e36521d069d branch: opensaf-5.0.x parent: 7563:05a2b76bcde6 user:minh chau date:Mon May 02 16:28:27 2016 +1000 summary: AMFND: Resend pg information after headless [#1719] changeset: 7566:a1c3fbe8ce95 user:minh chau date:Mon May 02 16:28:09 2016 +1000 summary: AMFND: Resend pg information after headless [#1719] --- ** [tickets:#1719] AMF: ProtectionGroup info lost after headless** **Status:** fixed **Milestone:** 5.0.GA **Created:** Wed Apr 06, 2016 12:55 AM UTC by Minh Hon Chau **Last Updated:** Mon May 02, 2016 11:08 AM UTC **Owner:** Minh Hon Chau After cluster is back from headless, protection group information currently is lost so there will be not track callback issued to component. Amfnd needs to resend these protection group information to amfd after headless --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1758 LOG : seg fault in saLogStreamOpen_2() during failover
- **status**: unassigned --> duplicate - **Milestone**: 4.7.2 --> never - **Comment**: The root cause of the fault was reported in ticket [#1705]. --- ** [tickets:#1758] LOG : seg fault in saLogStreamOpen_2() during failover** **Status:** duplicate **Milestone:** never **Created:** Thu Apr 14, 2016 06:21 AM UTC by Ritu Raj **Last Updated:** Wed May 04, 2016 06:56 PM UTC **Owner:** nobody **Attachments:** - [log_agent_trace.txt](https://sourceforge.net/p/opensaf/tickets/1758/attachment/log_agent_trace.txt) (105.0 kB; text/plain) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : seg fault in saLogStreamOpen_2() during failover * Steps to reproduce: > On a payload, initially init handle is obtained using saLogInitialize(). > Later some failovers are invoked. Agent got crashed in saLogStreamOpen_2() > call invoked just after a failover Following is the agent trace. Apr 14 11:16:27.558521 lga [29926:lga_api.c:0744] >> saLogStreamOpen_2 Apr 14 11:16:27.558548 lga [29926:lga_api.c:0566] >> validate_open_params Apr 14 11:16:27.558560 lga [29926:lga_api.c:0711] << validate_open_params Apr 14 11:16:27.558568 lga [29926:lga_api.c:0772] TR saLogStreamOpen_2 ** LGS no active** This isssue is observed in 2 out of 5 iterations. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1805 log: Deadlock in log agent makes client hang
- **Milestone**: 5.0.GA --> 5.0.1 --- ** [tickets:#1805] log: Deadlock in log agent makes client hang** **Status:** review **Milestone:** 5.0.1 **Created:** Tue May 03, 2016 02:30 PM UTC by elunlen **Last Updated:** Wed May 04, 2016 03:05 PM UTC **Owner:** elunlen In the log agent uses a lga_cb.cb_lock mutex to protect data in the global lga structure. This mutex is locked in the lga_recover_one_client() function. In the client thread, when this mutex is locked an MDS api (lga_mds_msg_sync_send) is called. This api will try to lock the MDS internal gl_mds_library_mutex which may already be locked. Both mutexes are also used in the MDS thread --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1541 AMF : Both the 2N SUs are assigned Standby SI Assignment run time objects.
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1541] AMF : Both the 2N SUs are assigned Standby SI Assignment run time objects.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Oct 13, 2015 07:17 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:06 AM UTC **Owner:** nobody **Attachments:** - [1541.sh](https://sourceforge.net/p/opensaf/tickets/1541/attachment/1541.sh) (16.7 kB; application/x-shellscript) Changeset : 6901 Configuration : 2N 2 SUs and 4 SIs with out si-si deps. Component recovery = 3 ( suFailoverflag disabled ) Steps : * Initially all the SIs are in assigned state. * Performed shutdown operation on the SU hosting active assignment * In the quiescing callback, ensure that component do not respond. * As part of recovery, the other SU got active callbacks. But the SI assignment objects for active are not created. Only the standby SI assignments are present in IMM. * After unlocking the locked SU, both the SUs are showing standby assignment from the siass runtime objects. safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) Below is the error logged in active controller syslog. Oct 13 00:19:51 CONTROLLER-2 osafamfd[11712]: EM sg_2n_fsm.cc:2359: safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN (55) Configuration to create the application is attached. Same issue is observed for the similar scenario during Node shutdown operation --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1510 CKPT: cpnd crashes during checkpoint open timeout with large sections
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1510] CKPT: cpnd crashes during checkpoint open timeout with large sections** **Status:** review **Milestone:** 4.7.2 **Created:** Thu Oct 01, 2015 04:14 PM UTC by Alex Jones **Last Updated:** Mon Nov 02, 2015 09:11 AM UTC **Owner:** Alex Jones When opening a collocated checkpoint replica where the active has large numbers of sections (~200k), the sync from the active can timeout with errorcode SA_AIS_ERR_TRY_AGAIN. In this case the code deletes the memory for the node, but does not delete the node from the db. When the checkpoint access is tried again, the freed memory for the node is still in the db, and ckptnd crashes. Valgrind analysis shows the following: ==53610== Thread 1: ==53610== Invalid read of size 4 ==53610==at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de60 is 0 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 8 ==53610==at 0x4E4D7C0: ncs_patricia_tree_get (patricia.c:90) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de70 is 16 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 8 ==53610==at 0x4E4D7FB: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de78 is 24 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 1 ==53610==at 0x4C2D0B9: bcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de80 is 32 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 1 ==53610==at 0x4C2D0D0: bcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de81 is 33 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 4 ==53610==at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602) ==53610==by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335) ==53610==by 0x40E9D6: cpnd_
[tickets] [opensaf:tickets] #1552 smf: read IMM_long_DN_config at the start of SmfCampaignInit function
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1552] smf: read IMM_long_DN_config at the start of SmfCampaignInit function** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Oct 21, 2015 07:10 AM UTC by Neelakanta Reddy **Last Updated:** Mon Nov 02, 2015 09:06 AM UTC **Owner:** nobody reproduction: 1. start opensaf from imm.xml 2. enable longdns immcfg -a longDnsAllowed=1 opensafImm=opensafImm,safApp=safImmService 3. create a longdn campaign name immcfg -c SaSmfCampaign safSmfCampaign=campign11133346677,safApp=safSmfService -a saSmfCmpgFileUri=/hostfs/campaigns/campaign.xml 4. execute the campign smf-adm execute safSmfCampaign=campign11133346677,safApp=safSmfService Following is the error in smfd: Oct 21 11:56:35.537035 osafsmfd [25055:SmfCampaignInit.cc:0184] NO CAMP: Campaign init, start add to IMM (2) Oct 21 11:56:35.537048 osafsmfd [25055:SmfRollback.cc:0062] TR Create rollback element 'smfRollbackElement=AddToImmCcb,safSmfCampaign=campign11133346677,safApp=safSmfService' Oct 21 11:56:35.537068 osafsmfd [25055:SmfRollback.cc:0070] TR Create rollback element, parent 'safSmfCampaign=campign11133346677,safApp=safSmfService', rdn 'smfRollbackElement=AddToImmCcb' Oct 21 11:56:35.537080 osafsmfd [25055:SmfImmOperation.cc:1280] >> execute Oct 21 11:56:35.537088 osafsmfd [25055:SmfImmOperation.cc:1206] >> createAttrValues Oct 21 11:56:35.537096 osafsmfd [25055:SmfImmOperation.cc:1238] TR c=[OpenSafSmfRollbackElement], p=[safSmfCampaign=campign11133346677,safApp=safSmfService], attr=[smfRollbackElement] Oct 21 11:56:35.537118 osafsmfd [25055:SmfImmOperation.cc:1261] << createAttrValues Oct 21 11:56:35.537140 osafsmfd [25055:SmfImmOperation.cc:1295] NO SmfImmRTCreateOperation::execute, createObject failed Too long parent name 342 Oct 21 11:56:35.537149 osafsmfd [25055:SmfImmOperation.cc:1296] << execute Oct 21 11:56:35.537169 osafsmfd [25055:SmfCampaignInit.cc:0193] ER SmfCampaignInit failed to create addToImm rollback element smfRollbackElement=AddToImmCcb,safSmfCampaign=campign11133346677,safApp=safSmfService, rc=SA_AIS_ERR_NAME_TOO_LONG solution : If the cluster is loaded from imm.xml, then the imm internal object is create at load time. The IMM object by default does not enable longdns. when the cluster is up then the longdns can be enabled. The smfd_cb->maxDnLength does not contain the length of longdn. so, the camign will not execute and gives ERR_NAME_TOO_LONG. This can be corrected if the immUtil.read_IMM_long_DN_config_and_set_control_block is called at the starting of SmfCampaignInit::execute. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces you
[tickets] [opensaf:tickets] #1576 AMF : SU struck in terminating ( health check timeout - proxy proxied )
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1576] AMF : SU struck in terminating ( health check timeout - proxy proxied )** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Oct 29, 2015 05:49 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:02 AM UTC **Owner:** nobody **Attachments:** - [1570.tgz](https://sourceforge.net/p/opensaf/tickets/1576/attachment/1570.tgz) (1.5 MB; application/x-compressed-tar) Changeset : 6901 Application : SU1 mapped to SC-2 & SU2 mapped to SC-1. Each SU consists of 3 Pre instantiable components ( one of the component is LOCAL & PROXIED and the other two components are SA_AWARE ) Steps : * Brought up two controllers in the cluster. * Performed unlock-in operation on SU1. * Health check is started by both SA-AWARE components. * One of the SA-AWARE components faulted in health check and as part of repair, SU is struck in terminating state. Oct 29 10:30:35 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State INSTANTIATING => INSTANTIATED Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO saAmfSUFailover is true for 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO SU failover probation timer started (timeout: 12000 ns) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Performing failover of 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' (SU failover count: 1) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safComp=2nAdminRepair,safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' recovery action escalated from 'noRecommendation' to 'suFailover' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safComp=2nAdminRepair,safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Terminating components of 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair'(abruptly & unordered) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State INSTANTIATED => TERMINATING Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State TERMINATING => TERMINATING Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State TERMINATING => TERMINATING * Amfd crashes during opensafd stop on the SC-2, Oct 29 11:27:46 SYSTEST-CNTLR-2 opensafd: Stopping OpenSAF Services Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Shutdown initiated Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Terminating all AMF components ... Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfd[3607]: NO Re-initializing with IMM ... Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfd[3607]: exiting for shutdown Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfnd[3617]: ER AMF director unexpectedly crashed Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfnd[3617]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1548 logd on standby crashed, for nonexistent logsv data group
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1548] logd on standby crashed, for nonexistent logsv data group** **Status:** assigned **Milestone:** 4.7.2 **Created:** Thu Oct 15, 2015 12:29 PM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:06 AM UTC **Owner:** Mathi Naickan Changeset : 6901 Steps : Logsv crashes on standby controller, if the group does not exits on standby controller. For the following command, logd crashed on the standby controller with the syslog. immcfg -a logDataGroupname=testGroup logConfig=1,safApp=safLogService Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: ER osaf_user_is_member_of_group: group 'testGroup' does not exist Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: WA lgs_cfg_verify_log_data_groupname: osaf_user_is_member_of_group() Fail Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: WA lgs_cfg_update: Verify fail for lgs configuration Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: ER ckpt_proc_lgs_cfg_v5 lgs_cfg_update Fail Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: lgs_mbcsv_v5.c:127: ckpt_proc_lgs_cfg_v5: Assertion '0' failed. Oct 15 17:09:49 CONTROLLER-1 osafamfnd[2281]: NO 'safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Logd also crashes, if the user is not part of the newly created group on the standby. Logsv should reject the ccb operation, if the standby is not properly updated --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1542 AMF : Quiesced callbacks should be generated, during recovery (su failover flag disabled)
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1542] AMF : Quiesced callbacks should be generated, during recovery (su failover flag disabled)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Oct 13, 2015 10:33 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:06 AM UTC **Owner:** nobody Changeset : 6901 Application : 2n 4 SIs configured with SI1 as sponsor for SI2,SI3,SI4 Component recovery policy - 3 sufailoverflag -0 Steps : * Initially all the SIs are in unassigned state. SU1 is hosted active and SU2 is hosted standby * Performed lock on SI4. * Later performed unlock on SI4, for which component in SU1 rejected the active callback. * As part of recovery, all the assignments to SU1 should be removed and active assignments to be given to standby su .i.e SU2. * In the current implementation, quiesced callbacks are not generated during removal of assignments. * According to the spec page NO ;195, If the service unit is configured to fail over as a single entity (saAmfSUFailover set to SA_TRUE), all other components of the service unit are abruptly termi- nated, and all service instances assigned to that service unit are failed over; oth- erwise, only the erroneous component is abruptly terminated, and all component service instances that were assigned to it are failed over. Other components are not terminated, but all service instances that contained one of the failed over component service instances have their remaining component service instances switched over * Below is the syslog on the node where SU1 is hosted. Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCED to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCED to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO SU failover probation timer started (timeout: 12000 ns) Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Performing failover of 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' (SU failover count: 1) Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackFailed' : Recovery is 'componentFailover' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State INSTANTIATED => TERMINATING Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'all (4) SIs' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI1,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI3,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI4,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI1,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI3,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI4,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'all SIs' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State TERMINATING => INSTANTIATED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.ne
[tickets] [opensaf:tickets] #1605 smfd: campaign not correctly terminated after failed SI-SWAP
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1605] smfd: campaign not correctly terminated after failed SI-SWAP ** **Status:** accepted **Milestone:** 4.7.2 **Created:** Thu Nov 19, 2015 12:26 PM UTC by Ingvar Bergström **Last Updated:** Wed Jan 13, 2016 06:05 AM UTC **Owner:** Mathi Naickan Scenario: 1)smfd order a SI-SWAP to continue the campaign execution on the other controller. 2)before swap is executed, an imm object "SmfRestartIndicator" is created to signal to the smf on the new controller the campaign restart was initiated by smf (spontaneus restarts will always fail the campaign in executing state). 3)When the new controller comes up smf will check the existence of the object. If present OK if not fail. If OK the "SmfRestartIndicator" object is removed. In this case the new controller fail to start very early, before smf was started. Smf never have a chance to remove the object. 5)AMF order a switchback to the first controller. 6)Smf start up on the "old" controller once again. Since the "SmfRestartIndicator" is still there, smf think the restart was ordered by smf and try to continue campaign execution which fail (the wrong way e.g. core dump) Todo: find a mechanism which make smf to detect the "SmfRestartIndicator" is the old one and treat this case as it does not exist. Make sure the new solution is backward compatible. The campaign continues at: file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::continueExec() The restart indicator is handeled in: file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::checkSmfRestartIndicator() --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1589 EVT : Segfault in saEvtEventDataGet in multithreaded app
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1589] EVT : Segfault in saEvtEventDataGet in multithreaded app** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Nov 10, 2015 01:22 AM UTC by Srikanth R **Last Updated:** Thu Mar 17, 2016 02:10 PM UTC **Owner:** nobody Changeset : 7071 Application : EDSV multi threaded application with multiple publisher threads and single subscriber thread. Steps : -> Each publisher thread creates a channel and waits for the subscriber. ->The subscriber thread comes up and subscribes to all the channels created by the publishers. -> Now all the publishers publish the event. -> In the event deliver callback, application segfaulted for the saEvtEventDataGet call. -> Below is the back trace 0 0x775a6224 in saEvtEventDataGet (eventHandle=4289724417, eventData=0x7fffde30, eventDataSize=0x7fffde28) at eda_saf_api.c:1944 1 0x0040113b in evtDeliverCallback (subscriptionId=4, eventHandle=4285530146, eventDataSize=20) at multithread/eda_thread1.c:25 2 0x775a9ed0 in eda_hdl_cbk_rec_prc (cb=0x6260c0, msg=0x6279f0, reg_cbk=0x6268e0) at eda_hdl.c:691 3 0x775aa20d in eda_hdl_cbk_dispatch_all (cb=0x6260c0, hdl_rec=0x6268d0) at eda_hdl.c:836 4 0x775a9d85 in eda_hdl_cbk_dispatch (cb=0x6260c0, hdl_rec=0x6268d0, flags=SA_DISPATCH_ALL) at eda_hdl.c:641 5 0x775a1e5a in saEvtDispatch (evtHandle=4289724417, dispatchFlags=SA_DISPATCH_ALL) at eda_saf_api.c:351 6 0x0040194d in subscriber_loop (thread_number=1) at multithread/eda_thread1.c:213 7 0x00401b64 in main (argc=1, argv=0x7fffe398) at multithread/eda_thread1.c:271 (gdb) p *evt_hdl_rec $2 = {event_hdl = 1, priority = 1 '\001', retention_time = 66370, publish_time = 140737488348072, publisher_name = {length = 4316, value = "@\000\000\000\000\000\360\242c\000\000\000\000\000\001\000\240\377", '\000' , "!\000\000\000\000\000\000\000\002\000\000\000\377\177\000\000خY\367\377\177\000\000\000\000\000\000\000\000\000\000!\001\000\000\000\000\000\000\017", '\000' , "\001", '\000' "\230, \266\371\366\377\177\000\000\001", '\000' }, pattern_array = 0x0, event_data_size = 0, evt_data = 0x0, evt_type = 0 '\000', parent_chan = 0x0, next = 0x0, pub_evt_id = 0, del_evt_id = 0} --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1590 Shutdown node hang if component calls saAmfFinalize during component failover
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during component failover** **Status:** accepted **Milestone:** 4.7.2 **Labels:** hanging shutdown node shutdown nodegroup **Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau **Last Updated:** Tue Nov 10, 2015 07:41 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd) (336.3 kB; application/octet-stream) - [syslog](https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog) (314.7 kB; application/octet-stream) The admin command shutdown node (or nodegroup) will hang if component calls saAmfFinalize during component failover. Trace is attached. Scenario: . Issue admin shutdown node . component rejects quiescing assignment saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION) . component calls saAmfFinalize, finalizing handle . Due to failure of quiescing assignment, component failover recovery is started. As result of it, clc cleanup is called. . The event finalize handle comes before clc cleanup returns ok. . avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The quiescing sequence can't be continued because avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which is callback list exist. But the fact component is unregistered, all handles are deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus the command hang until timeout Another similiar test is done on amf_demo, which calls saAmfFinalize when component receives sigterm. The assignment is quiesced then removed successfully, since amfnd is "aware of " unregistered component during quiesced assignment sequence. The quiescing assignment sequence should be aware of unregistered component this case, in order to avoid hanging shutdown node. Or saAmfFinalize should return TRY_AGAIN, to be analyzing ... --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1515 AMF : SU struck in terminating for failure during csi assignment in si-swap (Nway)
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1515] AMF : SU struck in terminating for failure during csi assignment in si-swap (Nway)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Oct 05, 2015 12:45 PM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:10 AM UTC **Owner:** nobody Changeset : 6901 amf application : 3 SUs with 5 SIs. ( Su1 and SU3 hosted on PL-3 and SU2 hosted on PL-4). Nway redundancy model. Issue : SU struck in terminating for failure during csi active assignment in si-swap (Nway) Steps : -> Initially brought up the application by unlocking the SG and below are the assignments . TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SI5 TestApp_SU1ACTIVE ACTIVE ACTIVE STANDBY TestApp_SU2STANDBY STANDBY STANDBY ACTIVE TestApp_SU3ACTIVE STANDBY -> Before performing si-swap operation on SU1, ensured that component with SI1 standby assignment shall reject the active callback -> Invoked the si-swap operation. As the component responded with ERR_FAILED_OP in active callback, recovery action is triggered for SU. Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_Nway' ACTIVE to 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nwa Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO 'safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nway' faulted due to 'csiSetcallbackFailed' : Recovery is 'componentFailover' Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nway' Presence State INSTANTIATED => TERMINATING But the SU struck in terminating state and below are the final assignments. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SI5 TestApp_SU1QUIESCED ACTIVE ACTIVE STANDBY STANDBY TestApp_SU2ACTIVE STANDBY STANDBY QUIESCED TestApp_SU3STANDBY ACTIVE ACTIVE --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1512 AMF : SU struck in Quiesced state after Lock operation of SU in Nway
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1512] AMF : SU struck in Quiesced state after Lock operation of SU in Nway** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Oct 05, 2015 08:56 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:11 AM UTC **Owner:** nobody **Attachments:** - [QuiescedNway.sh](https://sourceforge.net/p/opensaf/tickets/1512/attachment/QuiescedNway.sh) (11.3 kB; application/x-shellscript) Changeset : 6901 Amf application : 3 SUs hosted on PL-3 and PL-4 4 SIs ( Redundancy model : Nway ) Issue : SU struck in Quiesced state, after lock operation issued on one of the SU. Steps : -> Initially brought up AMF application configured in Nway redundancy model with 3Sus and 4 SIs. Below are the configuration attributes for SG. saAmfSGNumPrefStandbySUs SA_UINT32_T 1 (0x1) saAmfSGNumPrefInserviceSUs SA_UINT32_T 4 (0x4) saAmfSGNumPrefAssignedSUs SA_UINT32_T 4 (0x4) saAmfSGNumPrefActiveSUsSA_UINT32_T 3 (0x3) saAmfSGNumCurrNonInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrAssignedSUs SA_UINT32_T 3 (0x3) saAmfSGMaxStandbySIsperSU SA_UINT32_T 1 (0x1) saAmfSGMaxActiveSIsperSU SA_UINT32_T 3 (0x3) -> Brought up the application by unlocking the SG and below are the assignments. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1ACTIVE ACTIVE ACTIVE STANDBY TestApp_SU2STANDBY ACTIVE TestApp_SU3 STANDBY -> Now performed lock operation on the SU1. SU1 struck in quiesced state after the operation. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1QUIESCED QUIESCED STANDBY TestApp_SU2ACTIVE STANDBY ACTIVE ACTIVE TestApp_SU3STANDBY ACTIVE ** -> When the opensafd on the payload PL-3 is stopped, amfd on active controller crashed. Oct 5 13:04:13 CONTROLLER-1 osafamfd[8492]: su.cc:1885: dec_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs > 0' failed. Oct 5 13:04:13 CONTROLLER-1 osafamfnd[8502]: ER AMF director unexpectedly crashed The script to bring up the application is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1526] imm: 1PBE can see db as locked** **Status:** assigned **Milestone:** 4.7.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Mon Nov 02, 2015 10:47 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1529 Node rebooted as saImmOiInitialize_2 failed during middleware active assignment
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1529] Node rebooted as saImmOiInitialize_2 failed during middleware active assignment** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Oct 08, 2015 07:53 AM UTC by Chani Srivastava **Last Updated:** Mon Nov 02, 2015 09:08 AM UTC **Owner:** nobody **Attachments:** - [1529.tgz](https://sourceforge.net/p/opensaf/tickets/1529/attachment/1529.tgz) (586.3 kB; application/x-compressed-tar) - [SC1_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC1_syslog.txt) (436.4 kB; text/plain) - [SC2_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC2_syslog.txt) (425.6 kB; text/plain) Setup: Changeset-6901 Invoked continuous failovers on a 4-node Cluster with 2 controllers and 2 payloads. All nodes have 64bit architecture. 2PBE enabled with 25K objects Issue Observed: Cluster reset occurred on invoking continuous failovers Attachments: Attaching syslogs for SC-1 and SC-2 Traces for immnd and immd can be shared seperately if required Steps: * Initially SC-1 is active and SC-2 standby * A test script invoked failover via killing osafclmd on SC1 * SC-2 became active Oct 7 18:23:32 OSAF-SC1 root: killing osafclmd from invoke_failover.sh Oct 7 19:25:20 OSAF-SC2 osafamfd[2191]: NO FAILOVER StandBy --> Active * On the new active controler, saImmOiInitialize_2 failed Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5) Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init() Fail Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 333 (safLckService) <299, 2020f> Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 334 (safEvtService) <298, 2020f> Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5) Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init() Fail Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA MDS Send Failed Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA Error code 2 returned for message type 4 - ignoring * Other services also fail to initialize with IMM on new active controller..i.e. SC-2 * And finally SMF had csi set timeout * SC-2 went for reboot and hence the entire cluster reset, as SC-2 is the only active controller at the time Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: NO 'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackTimeout' : Recovery is 'nodeFailfast' Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: ER safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackTimeout Recovery is:nodeFailfast Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Oct 7 19:25:51 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1530 AMF : TwoN, SU struck in terminating during sponsor si lock ( csi quiescd timeout)
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1530] AMF : TwoN, SU struck in terminating during sponsor si lock ( csi quiescd timeout)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Oct 08, 2015 09:52 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:08 AM UTC **Owner:** nobody Changeset : 6901 Application : 2N , two SUs, 4 SIs ( 1 sponsor SI for the remaining 3 SIs) Issue : SU struck in terminating during sponsor si lock , when the sponsor rejected the quiescd assignment. Steps : * Initially all the SIs are in fully assigned state. * Invoked the lock of sponsor SI .i.e SI1. * When the quiesced callback is invoked, component did not respond for the callback. Oct 8 15:10:30 SYSTEST-PLD-1 osafamfnd[2645]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_TwoN' QUIESCED to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 15:10:40 SYSTEST-PLD-1 osafamfnd[2645]: NO Performing failover of 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' (SU failover count: 1) Oct 8 15:10:40 SYSTEST-PLD-1 osafamfnd[2645]: NO 'safComp=COMP1,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackTimeout' : Recovery is 'componentFailover' Oct 8 15:10:40 SYSTEST-PLD-1 osafamfnd[2645]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State INSTANTIATED => TERMINATING * All the assignments got removed as part of SI lock operation, but SU is not repaired. It got stuck in terminating and disabled state. * Further unlock of sponsor SI resulted in other SU getting active assignment. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1534 AMF : SU in disabled state, after CLM node lock op ( component reported failure for quiesced )
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1534] AMF : SU in disabled state, after CLM node lock op ( component reported failure for quiesced )** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Oct 09, 2015 11:00 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:07 AM UTC **Owner:** nobody **Attachments:** - [osafamfnd_trace](https://sourceforge.net/p/opensaf/tickets/1534/attachment/osafamfnd_trace) (167.9 kB; application/octet-stream) Changeset : 6901 Application : 2n ( Auto repair enabled for SG and Node) 2 SUs with 4 components each 4 SIs with as SI1 as sponsor for the remaining SIs Steps : * Initially all SIs are in assigned state. * Performed CLM lock operation on the SU hosing active assignment. In the quiesced callback, component COMP1 responded with FAILED_OP. Oct 8 23:44:26 PAYLOAD-2 osafamfnd[19393]: NO 'safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackFailed' : Recovery is 'componentFailover' Oct 8 23:44:26 PAYLOAD-2 osafamfnd[19393]: NO 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State INSTANTIATED => TERMINATING * As part of CLM Lock, all the components except COMP1, are uninstantiated. COMP1 got the CLEANUP command and also got instantiated .The component COMP1 should not be instantiated as part of recovery, as the CLM node is in LOCKED state. saAmfComponentRegister as part of the component instantiation returned ERR_BAD_OP, which resulted in SU moving to DISABLED state. Oct 8 23:44:26.200807 osafamfnd [19393:clc.cc:2924] T1 CLC CLI command env variable name = 'SA_AMF_COMPONENT_NAME': value ='safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 23:44:26.201127 osafamfnd [19393:comp.cc:2827] IN 'safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State UNINSTANTIATED => INSTANTIATING The SU and the AMF node hosting SU is marked as DISABLED, after the entire operation. In case of succesful scenario, SU is marked as ENABLED and the node is marked as DISABLED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1756 AMF : amfd on controller asserted ( for CSI removal timeout during application si lock )
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1756] AMF : amfd on controller asserted ( for CSI removal timeout during application si lock )** **Status:** review **Milestone:** 4.7.2 **Created:** Wed Apr 13, 2016 11:00 AM UTC by Srikanth R **Last Updated:** Fri Apr 29, 2016 11:33 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [1755.tgz](https://sourceforge.net/p/opensaf/tickets/1756/attachment/1755.tgz) (681.2 kB; application/x-compressed-tar) Changeset : 7436 Version : 5.0 FC Setup : Controller with 2 payloads. 2n red model with 2 SUs, 4 SIs and no si-si deps. Steps performed : -> Initially the application is brought up and all the SIs are fully assigned. -> LPerformed shutdown operation on one of the SI .i.e SI4. -> Ensured that application with active assignment shall time out in CSI removal callback. The shutdown operation timed out and the amfd on active controller asserted. Invoking admin operation SHUTDOWN on safSi=TestApp_SI4,safApp=TestApp_TwoN OP RETURN VALUE and AIS OP RETURN VAL = 5 -65536 Apr 13 16:17:40 CONTROLLER-2 osafamfd[2689]: sg_2n_fsm.cc:125: avd_su_fsm_state_determine: Assertion '0' failed. Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: WA AMF director unexpectedly crashed Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 6 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster probably because of setup issues
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1757] Standby controller failed to join the cluster probably because of setup issues** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj **Last Updated:** Mon Apr 18, 2016 09:18 AM UTC **Owner:** nobody *Setup: Changeset- 7436 Version - opensaf 5.0FC OS: SUSE 11SP2 x86_64 *Issue observed : Standby controller failed to join the cluster with error message "ER Failed to Initialize with CLM" *Steps To Reproduce: > OpenSAF is already up and running on controller1(SC-1) > when OpenSAF started on controller2(SC-2), it failed with following mesage: SCALE_SLOT-2:~ # /etc/init.d/opensafd start Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) (Using TIPC) Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: [1930938.251473] TIPC: Activated (version 2.0.0) ... Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started **Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with CLM: 8 Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed** Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting > The crossponding syslog of active controller(SC-1) at that time Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** ncs_patricia_tree_add, client_id** 53 Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster >> It is also observed that, on active controller(SC-1) there in no log record >> of osafclmd during which controller2(SC-2) failed, while other service have >> log record at that time stamp Below is the output of osafclmd (SC-1), during time stamp "Apr 26 20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything. Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt **Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk** Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec Note: 1. This is random issue 2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1532 AMF : SI should be reverted to unlocked state, after shutdown operation of SI is rejected
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1532] AMF : SI should be reverted to unlocked state, after shutdown operation of SI is rejected** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Oct 08, 2015 11:20 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:08 AM UTC **Owner:** Nagendra Kumar Changeset : 6901 Application : 2n ( two SUs and 4 SIs with SI1 as sponsor for the remaining SIs) Steps : * Initially all the SIs are in assigned state. * Invoked shutdown operation on one of the dependent SI .i.e SI2. * For the quiescing callback, component responded with FAILED_OP Oct 8 16:27:20 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCING to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Performing failover of 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' (SU failover count: 2) Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO 'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackTimeout' : Recovery is 'componentFailover' * After recovery of SU1, SI2 assignments are also done, which is not expected. Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State TERMINATING => INSTANTIATED Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI3,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' * Below is the SI state after the shutdown operation safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=FULLY_ASSIGNED(2) * Further unlock operation of SI resulted in TIMEOUT return op. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1568 CLMD segfaulted for pending lock op during middleware si-swap
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1568] CLMD segfaulted for pending lock op during middleware si-swap** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Oct 24, 2015 11:17 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:02 AM UTC **Owner:** nobody **Attachments:** - [SC-1.tgz](https://sourceforge.net/p/opensaf/tickets/1568/attachment/SC-1.tgz) (27.3 kB; application/x-compressed-tar) Changeset : 6901 Steps : 1) Invoked lock operation on one of the payload PL-5. 2) CLM Agent on PL-3 did not respond to the lock operation. 3) With this pending operation, invoked controller switchover. 4) CLMD on active controller seg faulted during quiesced processing. Oct 24 15:53:13 SYSTEST-CNTLR-1 osafamfd[5863]: NO Pending Response sent for CLM track callback::OK '1' Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfd[5863]: NO safSi=SC-2N,safApp=OpenSAF Swap initiated Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfnd[5873]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Oct 24 15:53:15 SYSTEST-CNTLR-1 osafimmnd[5809]: NO Implementer locally disconnected. Marking it as doomed 173 <457, 2010f> (safSmfService) Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfnd[5873]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' signal: 11 pid: 0 uid: 0 /usr/lib64/libopensaf_core.so.0(+0x1a27b)[0x7ff095ffb27b] /lib64/libpthread.so.0(+0xf7c0)[0x7ff0951207c0] /lib64/libc.so.6(cfree+0x39)[0x7ff094a0a2c9] /lib64/librt.so.1(timer_delete+0x42)[0x7ff094d08b52] /usr/lib64/opensaf/osafclmd[0x405298] /usr/lib64/libSaAmf.so.0(+0x9213)[0x7ff095dd0213] /usr/lib64/libSaAmf.so.0(+0xa307)[0x7ff095dd1307] /usr/lib64/libSaAmf.so.0(saAmfDispatch+0x1d4)[0x7ff095dcaf94] /usr/lib64/opensaf/osafclmd[0x4047df] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7ff0949aec36] /usr/lib64/opensaf/osafclmd[0x404ea5] CLMD trace is attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1562 AMF : (NPM ) Standby assignments are done with out any active assignment
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1562] AMF : (NPM ) Standby assignments are done with out any active assignment** **Status:** review **Milestone:** 4.7.2 **Created:** Fri Oct 23, 2015 01:59 PM UTC by Srikanth R **Last Updated:** Mon Feb 01, 2016 06:17 AM UTC **Owner:** Praveen **Attachments:** - [1562.tgz](https://sourceforge.net/p/opensaf/tickets/1562/attachment/1562.tgz) (178.3 kB; application/x-compressed-tar) Changeset : 6901 Setup : NPM application with 4 SUs hosted on PL-3 & PL-4 and 4SIs SU1 & SU3 hosted on PL-3 , SU2 & SU4 hosted on PL-4 Steps : After a series of operation on the NPM application, below are the state of assignments | TestApp_SI1 | TestApp_SI2 | TestApp_SI3 | TestApp_SI4 TestApp_SU1|ACTIVE |ACTIVE | | TestApp_SU2| | | ACTIVE |ACTIVE TestApp_SU3|STANDBY |STANDBY|STANDBY | TestApp_SU4| | | |STANDBY After opensafd is stopped on PL-3, below are the assignments TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1 TestApp_SU2 ACTIVE ACTIVE TestApp_SU3 TestApp_SU4STANDBY STANDBY STANDBY Corresponding log in syslog on PL-4 : Oct 23 19:00:29 PAYLOAD-2 osafimmnd[8101]: NO Implementer disconnected 40 <0, 2010f> (MsgQueueService131855) Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:32 PAYLOAD-2 kernel: [ 7785.128227] TIPC: Resetting link <1.1.4:eth3-1.1.3:eth3>, peer not responding Attached is amfd.state and amfd traces on active controller, amfnd trace on payload hosting SU2 & SU4 and also the NPM configuration. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1565 SMFD crashed on new active controller after failover
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1565] SMFD crashed on new active controller after failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Oct 24, 2015 05:44 AM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:03 AM UTC **Owner:** nobody **Attachments:** - [1565.tgz](https://sourceforge.net/p/opensaf/tickets/1565/attachment/1565.tgz) (2.1 MB; application/x-compressed-tar) Changeset : 6901 Setup : 5 nodes with out PBE Issue : SMFD crashed on new active controller after failover Steps done : -> Initially all the 5 nodes joined the cluster. SC-2 is the active and SC-1 is the standby controller. -> Stopped opensaf on the four nodes one by one from PL-5 to SC-2. -> When opensafd is stopped on SC-2, SC-1 took the active role and SMFD crashed on the SC-1. Oct 24 11:01:44 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer locally disconnected. Marking it as doomed 33 <364, 2010f> (MsgQueueService131599) Oct 24 11:01:45 SYSTEST-CNTLR-1 osafsmfd[2570]: NO openSafSmfExecControl is not set, using standard mode Oct 24 11:01:45 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer disconnected 33 <364, 2010f> (MsgQueueService131599) Oct 24 11:01:45 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer connected: 36 (MsgQueueService131855) <368, 2010f> Oct 24 11:01:45 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer locally disconnected. Marking it as doomed 36 <368, 2010f> (MsgQueueService131855) Oct 24 11:01:45 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer disconnected 36 <368, 2010f> (MsgQueueService131855) Oct 24 11:01:45 SYSTEST-CNTLR-1 osafimmnd[2370]: NO Implementer connected: 37 (safSmfService) <362, 2010f> Oct 24 11:01:45 SYSTEST-CNTLR-1 osafsmfd[2570]: ER amf_active_state_handler oi activate FAIL Oct 24 11:01:45 SYSTEST-CNTLR-1 osafamfnd[2434]: NO 'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' -> Trace from SMFD on SC-1 : Oct 24 11:01:45.014430 osafsmfd [2570:imma_om_api.c:0891] << saImmOmFinalize Oct 24 11:01:45.014436 osafsmfd [2570:smfd_campaign_oi.cc:0773] >> campaign_oi_activate Oct 24 11:01:45.022008 osafsmfd [2570:smfd_campaign_oi.cc:0802] TR immutil_saImmOiClassImplementerSet smfConfigOiHandle failed rc=12 class name=OpenSafSmfExecControl Oct 24 11:01:45.022035 osafsmfd [2570:smfd_amf.c:0058] ER amf_active_state_handler oi activate FAIL Oct 24 11:01:45.022043 osafsmfd [2570:smfd_amf.c:0062] << amf_active_state_handler -> This issue is random --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1566 Cluster reset happened during switchover due to AMF director heart beat timeout.
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1566] Cluster reset happened during switchover due to AMF director heart beat timeout.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Oct 24, 2015 06:25 AM UTC by Ritu Raj **Last Updated:** Mon Nov 02, 2015 09:03 AM UTC **Owner:** nobody Changeset: 6901 70 nodes configured with PBE Application: Nway configured on all the nodes Issues Observed: > Cluster reset happened during switchover due to AMF director heart beat > timeout. Steps Performed: * AMF (Nway) application brought up on the nodes. * Some operations are performed on Nway application hosted on PL-65 to PL-68. * Stopped opensaf on the nodes PL-65 to PL-68. * Two switchover performed on Cluster. First switchover succeded without any issue. During second switchover old standby controller (SC-2) rebooted when it is being promoted to ACTIVE state. Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmd[2505]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (2020f) Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmnd[2516]: NO Implementer (applier) connected: 130 (@OpenSafImmReplicatorA) <10675, 2020f> Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: ER safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Oct 22 15:45:10 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60 * After SC-2 went for reboot, SC-1 tried to become active during witch AMF director heart beat timeout and cluster reset happened. Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfd[2557]: NO 'safRankedSu=safSu=dummy_NWay_1Norm_4\,safSg=SG_dummy_n\,safApp=N_6,safSi=dummy_NWay_1Norm_6,safApp=N_6' Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfnd[2567]: ER AMF director heart beat timeout, generating core for amfd Oct 22 15:54:54 SLES-64BIT-SLOT1 osafamfnd[2567]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, SupervisionTime = 60 Oct 22 15:54:54 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60 Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA MDS Send Failed Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA Error code 2 returned for message type 16 - ignoring Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: NO Implementer locally disconnected. Marking it as doomed 136 <4871, 2010f> (@safAmfService2010f) * Traces are not availbale --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1661 log: deadlock when two threads call saLogStreamClose
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1661] log: deadlock when two threads call saLogStreamClose** **Status:** invalid **Milestone:** 4.7.2 **Created:** Fri Jan 08, 2016 01:28 PM UTC by Mathi Naickan **Last Updated:** Wed May 04, 2016 06:59 PM UTC **Owner:** Mathi Naickan A deadlock can occur when two threads call saLogStreamClose simultaneously. Thread1#0 0x7fa5c2c807bc in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x7fa5c2c7c489 in _L_lock_918 () from /lib64/libpthread.so.0 #2 0x7fa5c2c7c2b0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7fa5c30b034f in saLogStreamClose (logStreamHandle=4246732801) at lga_api.c:1344 Thread2 #0 0x7fa5c2c80010 in sem_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x7fa5c2c80010 in sem_wait () from /lib64/libpthread.so.0 #1 0x7fa5c1c2c762 in hm_block_me (cell=cell@entry=0xc8d980, pool_id=pool_id@entry=0 '\000') at hj_hdl.c:696 #2 0x7fa5c1c2c8dd in ncshm_destroy_hdl (id=id@entry=NCS_SERVICE_ID_LGA, uhdl=4246732801) at hj_hdl.c:366 #3 0x7fa5c30b0b89 in lga_log_stream_hdl_rec_del (list_head=list_head@entry=0xca4870, rm_node=rm_node@entry=0xcd7ba0) at lga_util.c:486 #4 0x7fa5c30b035b in saLogStreamClose (logStreamHandle=4246732801) at lga_api.c:1349 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1618 AMF: SI removal gets stuck when component termination failed
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1618] AMF: SI removal gets stuck when component termination failed** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Dec 04, 2015 06:22 AM UTC by Quyen Dao **Last Updated:** Fri Dec 04, 2015 06:22 AM UTC **Owner:** nobody **Attachments:** - [AppConfig-2N-3comp-1si-3csi.xml](https://sourceforge.net/p/opensaf/tickets/1618/attachment/AppConfig-2N-3comp-1si-3csi.xml) (16.6 kB; text/xml) - [amf_demo_delay_terminate_callback.patch](https://sourceforge.net/p/opensaf/tickets/1618/attachment/amf_demo_delay_terminate_callback.patch) (472 Bytes; application/octet-stream) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/1618/attachment/osafamfnd) (439.4 kB; application/octet-stream) - [syslog.PL-4](https://sourceforge.net/p/opensaf/tickets/1618/attachment/syslog.PL-4) (13.4 kB; application/octet-stream) **Description** SI removal gets stuck and SG becomes unstable when component termination failed **Steps to reproduce** 1. Change amf_demo CLI-CLI CLEANUP command always return 1 (to trigger termination failed) 2. Change amf_demo to delay the terminate_callback for 2 seconds (attached is the patch) 3. Load the attached model 4. Kill 1 amf_demo to trigger the component instantiation failed. This also leads to component failover **Result** - Failed component is cleaned up - Other components in the SU are terminated. - SI assigned to failed SU gets stuck at "Removing 'safSi=AmfDemo1,safApp=AmfDemo1' from 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1'" - SG become unstable and failed SU can't be "repaired" **Syslog** 2015-12-03 17:59:05 PL-4 amf_demo[618]: exiting (caught term signal) 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component restart probation timer started (timeout: 400 ns) 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Restarting a component of 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO 'safComp=C-AmfDemo1,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'componentRestart' 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Cleanup of 'safComp=C-AmfDemo1,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' failed 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Reason:'Exec of script success, but script exits with non-zero status' 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Exit code: 1 2015-12-03 17:59:05 PL-4 osafamfnd[419]: WA 'safComp=C-AmfDemo1,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATION_FAILED 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Component Failover trigerred for 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1': Failed component: 'safComp=C-AmfDemo1,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => TERMINATION_FAILED 2015-12-03 17:59:05 PL-4 amf_demo[625]: Terminating 2015-12-03 17:59:05 PL-4 amf_demo[632]: Terminating 2015-12-03 17:59:05 PL-4 osafamfnd[419]: NO Removing 'safSi=AmfDemo1,safApp=AmfDemo1' from 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' 2015-12-03 17:59:07 PL-4 amf_demo[632]: Terminated 2015-12-03 17:59:07 PL-4 amf_demo[625]: Terminated 2015-12-03 17:59:45 PL-4 osafamfnd[419]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' SU restart probation timer expired **Command log** root@PL-4:/srv/shared/osaf_amf_demo# date Thu Dec 3 18:02:55 ICT 2015 root@PL-4:/srv/shared/osaf_amf_demo# immadm -o 9 --disable-tryagain safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN (6) error-string: SG state is not stable root@PL-4:/srv/shared/osaf_amf_demo# Model, trace, syslog, amf_demo_patch are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1648 amf: amf director unexpectedly crashed up on ImplementerClear failed 5
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1648] amf: amf director unexpectedly crashed up on ImplementerClear failed 5** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Dec 18, 2015 06:16 AM UTC by A V Mahesh (AVM) **Last Updated:** Mon Dec 21, 2015 01:31 PM UTC **Owner:** nobody Was running fowling script to do continuous fail-over & switch-over alternatively and observed the below issue NUM=2 for (( i =0; i <= 100; i++)) do ((EVEN = ($NUM % 2))) if [ $EVEN -eq 0 ]; then echo "Starting opensafd restart " /etc/init.d/opensafd restart else echo "Starting opensafd si-swap " amf-adm si-swap safSi=SC-2N,safApp=OpenSAF fi ((NUM=$NUM + 1)) done Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 100 <0, 2010f> (safLogService) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 104 <0, 2010f> (safClmService) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 103 <0, 2010f> (safEvtService) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 101 <0, 2010f> (safCheckPointService) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 99 <0, 2010f> (safMsgGrpService) Dec 18 10:43:30 SC-2 osafimmd[27287]: WA IMMD lost contact with peer IMMD (NCSMDS_RED_DOWN) Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA DISCARD DUPLICATE FEVS message:18220 Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA Error code 2 returned for message type 82 - ignoring Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA DISCARD DUPLICATE FEVS message:18221 Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA Error code 2 returned for message type 82 - ignoring Dec 18 10:43:30 SC-2 osafimmd[27287]: WA IMMND DOWN on active controller f1 detected at standby immd!! f2. Possible failover Dec 18 10:43:30 SC-2 osafimmd[27287]: NO Skipping re-send of fevs message 18220 since it has recently been resent. Dec 18 10:43:30 SC-2 osafimmd[27287]: NO Skipping re-send of fevs message 18221 since it has recently been resent. Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Global discard node received for nodeId:2010f pid:6737 Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 98 <0, 2010f(down)> (MsgQueueService131343) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 102 <0, 2010f(down)> (safLckService) Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 97 <0, 2010f(down)> (@safAmfService2010f) Dec 18 10:43:31 SC-2 kernel: [68517.295788] tipc: Resetting link <1.1.2:eth2-1.1.1:eth1>, changeover initiated by peer Dec 18 10:43:31 SC-2 kernel: [68517.295794] tipc: Lost link <1.1.2:eth2-1.1.1:eth1> on network plane A Dec 18 10:43:31 SC-2 kernel: [68517.354991] tipc: Duplicate <1.1.1> using eth(08:00:27:3b:a5:86) seen on Dec 18 10:43:40 SC-2 osafamfd[27348]: ER FAILOVER Active --> Quiesced FAILED, ImplementerClear failed 5 Dec 18 10:43:40 SC-2 osafimmnd[27298]: WA ERR_BAD_HANDLE: Handle use is blocked by pending reply on syncronous call Dec 18 10:43:40 SC-2 osafimmnd[27298]: NO Implementer locally disconnected. Marking it as doomed 90 <9, 2020f> (safAmfService) Dec 18 10:43:40 SC-2 osafamfd[27348]: ER FAILOVER Active --> Quiesced FAILED, ImplementerClear failed 9 Dec 18 10:43:40 SC-2 osafamfd[27348]: NO Re-initializing with IMM Dec 18 10:43:40 SC-2 osafimmnd[27298]: WA IMMND - Client Node Get Failed for cli_hdl 38654837263 Dec 18 10:43:50 SC-2 osafamfd[27348]: ER saImmOiImplementerSet failed 5 Dec 18 10:43:50 SC-2 osafamfd[27348]: ER exiting since avd_imm_applier_set failed Dec 18 10:43:50 SC-2 osafamfnd[27362]: ER AMF director unexpectedly crashed Dec 18 10:43:50 SC-2 osafamfnd[27362]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Dec 18 10:43:50 SC-2 opensaf_reboot: Rebooting local node; timeout=60 Dec 18 10:49:17 SC-2 syslog-ng[1193]: syslog-ng starting up; version='2.0.9' --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensa
[tickets] [opensaf:tickets] #1629 base: Error in ncs_leap_startup leads to partially initialized ncs
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1629] base: Error in ncs_leap_startup leads to partially initialized ncs** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Dec 08, 2015 01:26 PM UTC by Hans Nordebäck **Last Updated:** Tue Dec 08, 2015 01:26 PM UTC **Owner:** nobody Ticket [#1623] returns error code if the node_id file is missing but ncs seems to be partially initialized. In the case where no node_id file exists and the return code from ncs_leap_startup is ignored and thereafter use sysf ipc seems to work. Finally calling ncs_leap_shutdown leads to a segv. This seems not consistent. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1661 log: deadlock when two threads call saLogStreamClose
- **status**: accepted --> invalid --- ** [tickets:#1661] log: deadlock when two threads call saLogStreamClose** **Status:** invalid **Milestone:** 4.6.2 **Created:** Fri Jan 08, 2016 01:28 PM UTC by Mathi Naickan **Last Updated:** Fri Jan 08, 2016 01:28 PM UTC **Owner:** Mathi Naickan A deadlock can occur when two threads call saLogStreamClose simultaneously. Thread1#0 0x7fa5c2c807bc in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x7fa5c2c7c489 in _L_lock_918 () from /lib64/libpthread.so.0 #2 0x7fa5c2c7c2b0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7fa5c30b034f in saLogStreamClose (logStreamHandle=4246732801) at lga_api.c:1344 Thread2 #0 0x7fa5c2c80010 in sem_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x7fa5c2c80010 in sem_wait () from /lib64/libpthread.so.0 #1 0x7fa5c1c2c762 in hm_block_me (cell=cell@entry=0xc8d980, pool_id=pool_id@entry=0 '\000') at hj_hdl.c:696 #2 0x7fa5c1c2c8dd in ncshm_destroy_hdl (id=id@entry=NCS_SERVICE_ID_LGA, uhdl=4246732801) at hj_hdl.c:366 #3 0x7fa5c30b0b89 in lga_log_stream_hdl_rec_del (list_head=list_head@entry=0xca4870, rm_node=rm_node@entry=0xcd7ba0) at lga_util.c:486 #4 0x7fa5c30b035b in saLogStreamClose (logStreamHandle=4246732801) at lga_api.c:1349 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1678 ntf: ntftest suite 35 result is not stable
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1678] ntf: ntftest suite 35 result is not stable** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Feb 04, 2016 07:44 AM UTC by Quyen Dao **Last Updated:** Thu Feb 04, 2016 07:44 AM UTC **Owner:** nobody root@SC-1:~# immcfg -a longDnsAllowed=1 opensafImm=opensafImm,safApp=safImmService root@SC-1:~# ntftest 35 Suite 35: CM notification test for extended name attribute 1 PASSED CREATE, runtime (OsafNtfCmTestRT) object, extended name attribute; 2 PASSED runtime, attr ch, REPLACE (EXTENDED NAME, ANY); 3 PASSED runtime, attr ch, ADD (EXTENDED NAME); 4 PASSED DELETE, runtime (OsafNtfCmTestRT) object; 5 PASSED CREATE, config (OsafNtfCmTestCFG) object, extended name attribute; 6 PASSED config, attr ch, REPLACE (EXTENDED NAME, ANY); 7 PASSED config, attr ch, ADD (EXTENDED NAME); 8 PASSED DELETE, config (OsafNtfCmTestCFG) object; = Test Result: Total: 8 Passed: 8 Failed: 0 root@SC-1:~# root@SC-1:~# ntftest 35 Suite 35: CM notification test for extended name attribute 1 PASSED CREATE, runtime (OsafNtfCmTestRT) object, extended name attribute; 2 PASSED runtime, attr ch, REPLACE (EXTENDED NAME, ANY); 3 PASSED runtime, attr ch, ADD (EXTENDED NAME); 4 PASSED DELETE, runtime (OsafNtfCmTestRT) object; 5 PASSED CREATE, config (OsafNtfCmTestCFG) object, extended name attribute; error: in test_ntf_imcn.c at 4679: SA_AIS_ERR_TRY_AGAIN, expected SA_AIS_OK - exiting root@SC-1:~# root@SC-1:~# ntftest 35 Suite 35: CM notification test for extended name attribute 1 PASSED CREATE, runtime (OsafNtfCmTestRT) object, extended name attribute; 2 PASSED runtime, attr ch, REPLACE (EXTENDED NAME, ANY); 3 PASSED runtime, attr ch, ADD (EXTENDED NAME); 4 PASSED DELETE, runtime (OsafNtfCmTestRT) object; error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_EXIST (14) 5 FAILED CREATE, config (OsafNtfCmTestCFG) object, extended name attribute (expected SA_AIS_OK, got SA_AIS_ERR_FAILED_OPERATION); 6 PASSED config, attr ch, REPLACE (EXTENDED NAME, ANY); 7 PASSED config, attr ch, ADD (EXTENDED NAME); 8 PASSED DELETE, config (OsafNtfCmTestCFG) object; = Test Result: Total: 8 Passed: 7 Failed: 1 root@SC-1:~# --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1729 Immd crashed on Active controller because of health check timeout
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1729] Immd crashed on Active controller because of health check timeout ** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj **Last Updated:** Wed Apr 06, 2016 11:02 AM UTC **Owner:** nobody Setup: Changeset- 7436 Version - opensaf 5.0 4 nodes configured with single PBE and a load of 30K objects Issue Observed: 1) Standby controller did not join the active controller. 2) IMMD on active controler got health check timeout. Steps performed: * Started OpenSAF on the controller SC-1 with PBE load and SC-1 took the active role . * Now, started OpenSAF on the controller SC-2 and SC-2 failed to join the cluster Apr 6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - ) (Using TIPC) Starting OpenSAF Services (Using TIPC):Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514531] TIPC: Activated (version 2.0.0) Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered protocol family .. Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f . Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Apr 6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me - problems with MDS ? 5 Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync. Giving up after 51 seconds, restarting.. Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed DESC:IMMND Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN /usr/lib/opensaf/clc-cli/osaf-immnd attempt #1 Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND, pid=28297 Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server job failed Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting... Apr 6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - problems with MDS ? 50 Apr 6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed * After the opensafd failed to come up on SC-2, SC-1 rebooted with immd healthcheck timeout. Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1) Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated from 'componentFailover' to 'suFailover' Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' **Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:healthCheckcallbackTimeout Recovery is:suFailover** Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Apr 6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60 This issue is random. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
[tickets] [opensaf:tickets] #1699 EVT : deadlock in saEvtFinalize after CLM unlock
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1699] EVT : deadlock in saEvtFinalize after CLM unlock ** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Mar 10, 2016 10:38 AM UTC by Srikanth R **Last Updated:** Thu Mar 17, 2016 09:04 AM UTC **Owner:** nobody Changeset : 6901 ( 4.7 GA ) * Initially EVT application is spawned on a payload and only saEvtInitialize is called. * Now CLM lock operation and later CLM unlock operation is issued, for which operation succeeded. * saEvtSelectionObjectGet and other apis are returning SA_AIS_OK with the handle obtained before CLM lock. * Deadlock in saEvtFinalize call is observed after CLM unlock. (gdb) thread apply all bt Thread 3 (Thread 0x7f67a38dab00 (LWP 4286)): 0 0x7f67a275e4f6 in poll () from /lib64/libc.so.6 1 0x7f67a1dc9d61 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0 2 0x7f67a1dd134f in ncs_tmr_wait () from /usr/lib64/libopensaf_core.so.0 3 0x7f67a2c847b6 in start_thread () from /lib64/libpthread.so.0 4 0x7f67a27679cd in clone () from /lib64/libc.so.6 5 0x in ?? () Thread 2 (Thread 0x7f67a38a9b00 (LWP 4287)): 0 0x7f67a275e4f6 in poll () from /lib64/libc.so.6 1 0x7f67a1e05f3e in mdtm_process_recv_events () from /usr/lib64/libopensaf_core.so.0 2 0x7f67a2c847b6 in start_thread () from /lib64/libpthread.so.0 3 0x7f67a27679cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 1 (Thread 0x7f67a38ac700 (LWP 4285)): 0 0x7f67a2c8aa00 in sem_wait () from /lib64/libpthread.so.0 1 0x7f67a1dd9fd2 in hm_block_me () from /usr/lib64/libopensaf_core.so.0 2 0x7f67a1dda14d in ncshm_destroy_hdl () from /usr/lib64/libopensaf_core.so.0 3 0x7f67a34b7b44 in eda_hdl_rec_del () from /usr/lib64/libSaEvt.so.1 4 0x7f67a34b1df5 in saEvtFinalize () at eda_saf_api.c:444 5 0x0040e445 in edsv_err_unavail_02_node02 () --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1730 LOG: group ownership of log file is not correct after failover
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1730] LOG: group ownership of log file is not correct after failover** **Status:** assigned **Milestone:** 4.7.2 **Created:** Wed Apr 06, 2016 09:16 AM UTC by Quyen Dao **Last Updated:** Thu Apr 21, 2016 04:44 AM UTC **Owner:** Canh Truong *Steps to reproduce* 1. Create test_data_group_1 on active SC 2. Create test_data_group_2 then test_data_group_1 on standby SC 3. Add user who owned the osaflogd process to test_data_group_1 on both SC 4. Configure log data group to test_data_group_1 by immcfg command 5. Reboot the active SC 6. On standby SC, list the log files with group name and it results like below. Result: The group ownership of notifcation log stream is test_data_group_2 instead test_data_group_1 after failover. Note: * The saflog directory is in the shared file system. * opensafd is started as root root@SC-2:~# ll /srv/shared/saflog/ total 44 drwxrwxrwx 4 root root 4096 Mar 23 13:53 ./ drwxrwxrwx 7 ubuntu ubuntu4096 Mar 22 14:51 ../ drwxrwxr-x 2 ubuntu ubuntu4096 Mar 23 13:51 repo/ -rw-r--r-- 1 root root 138 Mar 23 13:53 saLogAlarm.cfg -rw-r--r-- 1 root root 138 Mar 22 17:35 saLogAlarm_20160322_173951.cfg -rw-r- 1 root test_data_group_1 200 Mar 23 13:53 saLogAlarm_20160323_135207.log -rw-r--r-- 1 root root 138 Mar 23 13:52 saLogNotification.cfg -rw-r- 1 root test_data_group_20 Mar 23 13:52 saLogNotification_20160323_135207.log -rw-r--r-- 1 root root 143 Mar 23 13:53 saLogSystem.cfg -rw-r- 1 root test_data_group_1 6876 Mar 23 13:53 saLogSystem_20160323_135207.log root@SC-2:~# --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1733 Payload got rebooted when cpnd is killed on payload
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1733] Payload got rebooted when cpnd is killed on payload** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Apr 06, 2016 11:05 AM UTC by Madhurika Koppula **Last Updated:** Tue Apr 12, 2016 10:43 AM UTC **Owner:** nobody **Attachments:** - [cpsv.tgz](https://sourceforge.net/p/opensaf/tickets/1733/attachment/cpsv.tgz) (15.0 MB; application/octet-stream) Setup: Changeset- 7436 Version - opensaf 5.0 4 nodes configured with single PBE Issue Observed: It is random. 1) When CPND is killed on payload, component restart of CPND failed because of expiration of component registration timer. 2) Node went for reboot. Test application is being ran. Below is the timestamp of PL-4: Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO Restarting a component of 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Apr 6 10:52:00 OEL_M-SLOT-4 osafckptnd[6263]: Started Apr 6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed Apr 6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration timer expired Apr 6 10:52:10 OEL_M-SLOT-4 osafckptnd[6294]: Started Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration timer expired Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: WA 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State RESTARTING => INSTANTIATION_FAILED Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Component Failover trigerred for 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Failed component: 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: ER 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got Inst failed Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: Rebooting OpenSAF NodeId = 132111 EE Name = , Reason: NCS component Instantiation failed, OwnNodeId = 132111, SupervisionTime = 60 Apr 6 10:52:20 OEL_M-SLOT-4 opensaf_reboot: Rebooting local node; timeout=60 Apr 6 10:52:46 OEL_M-SLOT-4 kernel: imklog 5.8.10, log source = /proc/kmsg started. 3) Below is the timestamp of ACTIVE controller: Apr 6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: WA No coordinator IMMND known (case B) - ignoring sync request Apr 6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: NO Node 2040f request sync sync-pid:2980 epoch:0 Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Resetting link <1.1.1:eth3-1.1.4:eth3>, peer not responding Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost link <1.1.1:eth3-1.1.4:eth3> on network plane A Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost contact with <1.1.4> Apr 6 10:52:24 OEL_M-SLOT-1 osafamfd[7003]: NO Node 'PL-4' left the cluster Apr 6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not sending track callback for agents on that node Apr 6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not sending track callback for agents on that node Apr 6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Global discard node received for nodeId:2040f pid:2980 Apr 6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Implementer connected: 1539 (MsgQueueService132111) <12283, 2010f> --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1767 AMFD: Standby assignments get removed if failover during unlock standby SU
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1767] AMFD: Standby assignments get removed if failover during unlock standby SU** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sun Apr 17, 2016 07:42 PM UTC by Minh Hon Chau **Last Updated:** Sun Apr 17, 2016 07:42 PM UTC **Owner:** nobody Configuration: - Set up 2N amf_demo with active SU4 on PL4, standby SU5, SU5B on PL5. - 3 components for each SU - 3 SIs, with dependency: SI3 -> SI2 -> S1 (highest sponsored SI) Steps - Unlock SU4, SU4 get Active assignment - Unlock SU5, delay csi standby assignment on SU5, reboot PL4 - After PL4 comes up, release delay csi standby assignment on SU5 - SU5 get Standby assigment. - SU5's assignment get removed. - SU4 get Active assignment - SU5 get Standby assignment again. The issue is after SU5 first finishes Standby assignment, amfd should assign Active assignment toSU5t immediately instead of removing SU5's Standby assignment. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1768 AMFD: Standby assignments get removed if failover during unlock SI
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1768] AMFD: Standby assignments get removed if failover during unlock SI** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sun Apr 17, 2016 07:55 PM UTC by Minh Hon Chau **Last Updated:** Sun Apr 17, 2016 07:55 PM UTC **Owner:** nobody Configuration: - Set up 2N amf_demo with active SU4 on PL4, standby SU5, SU5B on PL5. - 3 components for each SU - 3 SIs, with dependency: SI3 -> SI2 -> S1 (highest sponsored SI) Steps: - Unlock highest sponsored SI - SU4 finishes Active assignment - while SU5 is starting to get STandby assignment, delay Standby csi assignment on SU5 - Reboot PL4 - After PL4 comes up, release delay Standby csi assignment on SU5 - SU5 finishes Standby assignment - SU5's assignment get removed - SU5 get Active assignment - SU5B get Standby assignment The issue is after SU5 first finishes Standby assignment, amfd should assign Active assignment to SU5 immediately Note: Root cause looks similar to ticket #1767, however this ticket comes from different test case and the result is also different from #1767 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1765] saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** accepted **Milestone:** 4.7.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Mon Apr 25, 2016 07:20 AM UTC **Owner:** Pham Hoang Nhat **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: retval = 1 Apr 15 8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with return value:2,ckptHandle:63 Apr 15 8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: API return code = 2** > Traces of both controllers and agent trace of payload is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1758 LOG : seg fault in saLogStreamOpen_2() during failover
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1758] LOG : seg fault in saLogStreamOpen_2() during failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Apr 14, 2016 06:21 AM UTC by Ritu Raj **Last Updated:** Thu Apr 28, 2016 08:32 PM UTC **Owner:** nobody **Attachments:** - [log_agent_trace.txt](https://sourceforge.net/p/opensaf/tickets/1758/attachment/log_agent_trace.txt) (105.0 kB; text/plain) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : seg fault in saLogStreamOpen_2() during failover * Steps to reproduce: > On a payload, initially init handle is obtained using saLogInitialize(). > Later some failovers are invoked. Agent got crashed in saLogStreamOpen_2() > call invoked just after a failover Following is the agent trace. Apr 14 11:16:27.558521 lga [29926:lga_api.c:0744] >> saLogStreamOpen_2 Apr 14 11:16:27.558548 lga [29926:lga_api.c:0566] >> validate_open_params Apr 14 11:16:27.558560 lga [29926:lga_api.c:0711] << validate_open_params Apr 14 11:16:27.558568 lga [29926:lga_api.c:0772] TR saLogStreamOpen_2 ** LGS no active** This isssue is observed in 2 out of 5 iterations. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1770 AMF : amfnd segfaulted during su failover escalation
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation** **Status:** review **Milestone:** 4.7.2 **Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R **Last Updated:** Thu Apr 28, 2016 07:18 AM UTC **Owner:** Praveen Setup : 5 node cluster with 3 payloads changeset : 7438 ( opensaf 5.0.FC) Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled) Issue : AMFND hosting the faulty SU segfaulted during su Failover escalation as part of SU lock operation Steps performed : -> Initially bring up the application and ensure that application is fully assigned. -> Perform one fault operation on the SU hosting the active assignment, such a way that the next fault is escalated to su failover. -> Perform lock operation of SU hosting the active assignment. -> Do not respond to the CSI removal callback, for which this fault shall be escalated to su failover. -> AMFND seg faulted with the following bt file signal: 11 pid: 320 uid: 0 /usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d] /lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0] /usr/lib64/opensaf/osafamfnd[0x43b1ff] /usr/lib64/opensaf/osafamfnd[0x417f89] /usr/lib64/opensaf/osafamfnd[0x408469] /usr/lib64/opensaf/osafamfnd[0x42c65a] /usr/lib64/opensaf/osafamfnd[0x42c4a0] /usr/lib64/opensaf/osafamfnd[0x42b979] /lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36] /usr/lib64/opensaf/osafamfnd[0x405f29] -> Below is the entry in osafamfnd trace : Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence state:SA_AMF_PRESENCE_TERMINATING(4) Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1 Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: retval=1 Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1 Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << avnd_comp_csi_remove_done: 1 Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1 Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << all_csis_in_removed_state: 1 Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << susi_operation_in_progress: 1 Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> is_no_assignment_due_to_escalations Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << is_no_assignment_due_to_escalations: true Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR NCS:PROCESS_ID=2500 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1772 AMF: amfnd false assert when unlock 2N Active SU
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1772] AMF: amfnd false assert when unlock 2N Active SU** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Apr 20, 2016 02:39 AM UTC by Minh Hon Chau **Last Updated:** Wed Apr 20, 2016 02:39 AM UTC **Owner:** nobody **Attachments:** - [amfnd_assert.tgz](https://sourceforge.net/p/opensaf/tickets/1772/attachment/amfnd_assert.tgz) (290.1 kB; application/x-compressed) - Configuration: as attached file . Set up 2N amf_demo with active SU4 on PL4, standby SU5, SU5B on PL5. . 3 components for each SU . 3 SIs, with dependency: SI3 -> SI2 -> S1 (highest sponsored SI) - Steps: . Bring up 2 SU . Lock Active SU4, delay csi remove callback . Meanwhile, reboot PL5 . Release delay of csi remove callback in SU4 -> assignments are removed successfully . PL5 restarts, SU5B get Active, SU5 get Standb assignments . Lock SU5, SU5B . Unlock SU4 -> amfnd on PL4 asserts 2016-04-20 12:36:14 PL-4 osafamfnd[423]: di.cc:835: avnd_di_susi_resp_send: Assertion 'si' failed. 2016-04-20 12:36:14 PL-4 osafamfwd[465]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 132111, SupervisionTime = 60 2016-04-20 12:36:14 PL-4 amf_demo[660]: AL AMF Node Director is down, terminate this process --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1778 AMF: Failed to repair SU after fault injection on Standby SU
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1778] AMF: Failed to repair SU after fault injection on Standby SU** **Status:** unassigned **Milestone:** 4.7.2 **Labels:** Repair SU failed **Created:** Sat Apr 23, 2016 05:30 PM UTC by Minh Hon Chau **Last Updated:** Sat Apr 23, 2016 05:31 PM UTC **Owner:** nobody **Attachments:** - [app3_twon3su3si_3sidep.xml](https://sourceforge.net/p/opensaf/tickets/1778/attachment/app3_twon3su3si_3sidep.xml) (14.1 kB; text/xml) Configuration: - Set up 2N amf_demo with active SU4 on PL4, standby SU5, SU5B on PL5. - 3 components for each SU - 3 SIs, with dependency: SI3 -> SI2 -> S1 (highest sponsored SI) Steps: - Unlock highest sponsored SI - SU4 finishes Active assignment - while SU5 is starting to get STandby assignment, delay Standby csi assignment on SU5 - Reboot PL4 - After PL4 comes up, release delay Standby csi assignment on SU5, but response FAILED_OP for Standby csi assignment - SU failover reports on SU5 - Repair SU5, SU5 finishes Standby assignment, but SU5's assignment get removed right after that - Meanwhile, amfd on SC1 get warning: "2016-04-24 03:05:24 SC-1 osafamfd[488]: EM sg_2n_fsm.cc:2385: safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon (46)" --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1789 log: not verify the logBufSize caused the node malfunctioned
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1789] log: not verify the logBufSize caused the node malfunctioned** **Status:** review **Milestone:** 4.7.2 **Created:** Thu Apr 28, 2016 11:06 AM UTC by Vu Minh Nguyen **Last Updated:** Wed May 04, 2016 07:23 AM UTC **Owner:** Vu Minh Nguyen Normally, the log clients pass `logBufSize` value which is calculated based on the data in `logBuf` to `safLogWriteLogAsync()` LOG API. But when accidentally passing an invalid value of `logBufSize` to `saLogWriteLogAsync()` such as a very large number which is caused by not using `strlen()` on `logBuf`, it will cause a lot of troubles: E.g: 1) The safLog will be flooded by zero and has very big size (e.g 4GB) 2) Eat lots of RAM 3) Consume much CPU 4) Other things --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1801 lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers.
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1801] lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 02, 2016 09:52 AM UTC by Madhurika Koppula **Last Updated:** Mon May 02, 2016 09:52 AM UTC **Owner:** nobody **Attachments:** - [glsv.tgz](https://sourceforge.net/p/opensaf/tickets/1801/attachment/glsv.tgz) (3.0 MB; application/octet-stream) Setup: Changeset- 7436 OS: Oracle Linux Server release 6.4 (x86_64) 4 nodes configured with single PBE some failover tests are being ran. safLock=resource1_101 object is not getting deleted. Thereby saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE is continuously returning SA_AIS_ERR_TIMEOUT. With sleep of 10secs, 15times retry is done on the same API call. Snippet from the run: 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599312 100|7| 100|7| 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599392 100|7| 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| FAILED : saLckResourceOpen with valid parameters 100|7| Return Value: SA_AIS_ERR_TIMEOUT 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE Timeout count exceeded: 15 Timestamp of the Active controller at this instant: May 2 14:22:56 OEL_M-SLOT-2 root: killing osafimmd from run_failover.sh May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 May 2 14:22:56 OEL_M-SLOT-2 opensaf_reboot: Rebooting local node; timeout=60 Timestamp of the Standby controller which is becoming active after failover: May 2 14:23:00 OEL_M-SLOT-1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF May 2 14:23:00 OEL_M-SLOT-1 osaffmd[1677]: NO Controller Failover: Setting role to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO RDE role set to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO Running '/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s) May 2 14:23:00 OEL_M-SLOT-1 osafimmd[1688]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osaflogd[1711]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafntfd[1722]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafclmd[1733]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafamfd[1744]: NO FAILOVER StandBy --> Active /var/log/messages and osaflckd traces of both controllers are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1803 imm: "CcbAugment with 2 OIs" testcase crashes
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1803] imm: "CcbAugment with 2 OIs" testcase crashes** **Status:** assigned **Milestone:** 4.7.2 **Created:** Tue May 03, 2016 09:33 AM UTC by Hung Nguyen **Last Updated:** Tue May 03, 2016 11:05 AM UTC **Owner:** Hung Nguyen Not always reproducible root@SC-1:~# /hostfs/immoitest 6 4 Suite 6: Augmented CCBs immoitest: test_saImmOiAugmentCcbInitialize.c:717: saImmOiCcbAugmentInitialize_04: Assertion `callbackCounter == 6' failed. Aborted (core dumped) --- Analysis: Increment of 'callbackCounter' in the OI callbacks is not atomic. Even when the 2 OIs receives 6 callbacks in total, 'callbackCounter' may not be 6. ~~~ /* Wait for 2 completed and 2 apply callbacks */ while(callbackCounter != 6 && threadCounter == 2) usleep(500); assert(callbackCounter == 6); ~~~ In that case, the OI threads will exit after 2 seconds (poll time out). That results in the while() loop stopping (threadCounter != 2) and assertion failing. 'callbackCounter' and 'threadCounter" should be thread-safe. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1804 imm: "OM Thread Interference" testcase crashes
- **Milestone**: 4.6.2 --> 4.7.2 --- ** [tickets:#1804] imm: "OM Thread Interference" testcase crashes** **Status:** assigned **Milestone:** 4.7.2 **Created:** Tue May 03, 2016 11:05 AM UTC by Hung Nguyen **Last Updated:** Tue May 03, 2016 11:05 AM UTC **Owner:** Hung Nguyen **Attachments:** - [bt_core.1461263164.immomtest.1536.PL-3](https://sourceforge.net/p/opensaf/tickets/1804/attachment/bt_core.1461263164.immomtest.1536.PL-3) (7.4 kB; application/octet-stream) Not always reproducible. Full backtrace is attached to this ticket. root@SC-1:~# immomtest 1 20 ~~~ Thread 4 (Thread 0x7f9299243780 (LWP 1536)): #0 0x7f92982f9f3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 No locals. #1 0x7f929832b4a4 in usleep (useconds=) at ../sysdeps/unix/sysv/linux/usleep.c:32 ts = {tv_sec = 0, tv_nsec = 20} #2 0x00432f6d in saImmOmThreadInterference_01 () at test_saImmOmThreadInterference.c:173 immHandle = 498216338191 immOiHandle = 502511305487 ownerHandle = 1461263164437942973 searchHandle = 0 ccbHandle = 1461263164439715898 threadid1 = 140267581654784 threadid2 = 140267573262080 rc = SA_AIS_ERR_LIBRARY attrDef = {attrName = 0x44e7b1 "rdn", attrValueType = SA_IMM_ATTR_SASTRINGT, attrFlags = 258, attrDefaultValue = 0x0} attrDefs = {0x7768dc30, 0x0} classCategory = (unknown: 4150844272) attrDefinitions = 0x0 accessorHandle = 1461263164479509139 attributeNames = {0x44e7b5 "SaImmAttrClassName", 0x0} attributes = 0x7768de00 objNames = {0x666b00 , 0x0} attrValue = {attrName = 0x44e7b1 "rdn", attrValueType = SA_IMM_ATTR_SANAMET, attrValuesNumber = 1, attrValues = 0x7768dbd0} attrValues = {0x7768dc10, 0x0} modAttrValue = {0x44e7c8, 0x0} attrMod = {modType = SA_IMM_ATTR_VALUES_REPLACE, modAttr = {attrName = 0x44e7cd "attr4", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValuesNumber = 1, attrValues = 0x7768dbf0}} attrMods = {0x7768dc50, 0x0} __PRETTY_FUNCTION__ = "saImmOmThreadInterference_01" #3 0x0043987e in run_test_case (suite=1, tcase=20) at utest.c:175 No locals. #4 0x00439c78 in test_run (suite=1, tcase=20) at utest.c:231 i = 32624793 j = 0 #5 0x00442a71 in main (argc=4, argv=0x7768de08) at immtest.c:113 suite = 1 tcase = 20 rc = 0 i = 4 index = 2 longDn = 1 failed = 0 endptr = 0x7768ff0b "" ~~~ ~~~ Thread 1 (Thread 0x7f929760c700 (LWP 1544)): #0 0x7f9298853add in search (pTree=0x7f9298d9e230 , key=0x7f929760be58 "\017\003\002") at patricia.c:93 pNode = 0x0 pPrevNode = 0x7f9298d9e230 #1 0x7f9298854504 in ncs_patricia_tree_get (pTree=0x7f9298d9e230 , pKey=0x7f929760be58 "\017\003\002") at patricia.c:433 pNode = 0x7f9298d9e198 #2 0x7f9298b43602 in imma_client_node_get (client_tree=0x7f9298d9e230 , cl_hdl=0x7f929760be58, cl_node=0x7f929760be70) at imma_db.c:55 No locals. #3 0x7f9298b35263 in saImmOiDispatch (immOiHandle=502511305487, dispatchFlags=SA_DISPATCH_ONE) at imma_oi_api.c:531 rc = SA_AIS_OK cb = 0x7f9298d9e180 cl_node = 0x0 locked = true pend_fin = 0 pend_dis = 0 __FUNCTION__ = "saImmOiDispatch" #4 0x00432395 in implementer_thread (arg=0x7768db68) at test_saImmOmThreadInterference.c:63 fd = {{fd = 9, events = 1, revents = 32}} selObject = 9 ret = 1 err = SA_AIS_OK immOiHandle = 502511305487 #5 0x7f9298606182 in start_thread (arg=0x7f929760c700) at pthread_create.c:312 __res = pd = 0x7f929760c700 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140267581654784, 8097029872058037852, 0, 0, 140267581655488, 140267581654784, -8107813012334774692, -8107784432235776420}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #6 0x7f929833347d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 ~~~ --- Analysis: At the end of the testcase, saImmOiFinalize() is called on the main thread. In saImmOiFinalize(), imma_callback_ipc_destroy() is called and indicates an event on the OI sel-obj. On the OI thread, saImmOiDispatch() is called due to the event on OI sel-obj. So saImmOiFinalize() and saImmOiDispatch() are invoked simultaneously on two different threads. saImmOiFinalize() calls ncs_patricia_tree_del(). saImmOiDispatch() calls ncs_patricia_tree_get(). That causes the segmentation fault. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further
[tickets] [opensaf:tickets] #1751 PLM: handle isolation failures
- **status**: review --> fixed - **Version**: 4.7 --> 4.6 - **Milestone**: 4.7.1 --> 4.6.2 - **Comment**: [staging:1d2e9f] [staging:f8c26e] [staging:d51e98] [staging:1ba5c5] changeset: 7597:1d2e9f89a581 branch: opensaf-4.6.x parent: 7560:2ab983a28c50 user:Alex Jones date:Thu May 05 00:08:19 2016 +0530 summary: plm: disallow activation of blade if it is operationally disabled [#1751] changeset: 7598:f8c26e14086a branch: opensaf-4.7.x parent: 7562:8a3a21dc75e7 user:Alex Jones date:Thu May 05 00:08:19 2016 +0530 summary: plm: disallow activation of blade if it is operationally disabled [#1751] changeset: 7599:d51e98d73f95 branch: opensaf-5.0.x parent: 7596:2941ea4c6fb3 user:Alex Jones date:Thu May 05 00:08:19 2016 +0530 summary: plm: disallow activation of blade if it is operationally disabled [#1751] changeset: 7600:1ba5c54d8555 tag: tip parent: 7590:62b34f2f4289 user:Alex Jones date:Thu May 05 00:08:19 2016 +0530 summary: plm: disallow activation of blade if it is operationally disabled [#1751] --- ** [tickets:#1751] PLM: handle isolation failures** **Status:** fixed **Milestone:** 4.6.2 **Created:** Tue Apr 12, 2016 07:05 PM UTC by Alex Jones **Last Updated:** Tue May 03, 2016 06:01 PM UTC **Owner:** Alex Jones In ATCA architectures the PICMG 3.0 spec states that for non-recoverable temperature sensor assertions, the shelf manager will automatically shut down the FRU. This shows up as an "extraction pending" request from HPI. The PLM implementation cannot currently differentiate between the shelf manager bringing the FRU down, and the operator attempting an extraction by manual extraction. Both of these initiate the deactivation policy. The shelf manager automatically shutting down the FRU, in this case, can cause HE isolation to fail because the FRU has already begun shutting down when the readiness impact call is made. In this case the deactivation request will fail. The code currently correctly handles this case by setting the isolate-pending flag. But, some shelf managers will attempt to reactivate the FRU, once it has been deactivated. PLM should reject an activation if the HE is operationally disabled. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1609 Introduce dedicated AIS handler threads
- **Milestone**: 5.0.FC --> future --- ** [tickets:#1609] Introduce dedicated AIS handler threads** **Status:** unassigned **Milestone:** future **Created:** Tue Nov 24, 2015 09:03 AM UTC by Anders Widell **Last Updated:** Mon Dec 07, 2015 12:41 PM UTC **Owner:** nobody This is a generic cleanup ticket for introducing one or several dedicated AIS handler threads in the OpenSAF services. The most urgent need is for the IMM API calls, but a similar approach can be used for other AIS services as and when needed. The problem we are trying to solve is that sometimes, IMM API calls can be blocked for a very long time (up to several minutes). The reason why IMM is blocked is normally an ongoing sync, but there could be other reasons why an AIS call (not only IMM) could be blocked: * If we have circular dependencies between the OpenSAF services, and the service we are depending on has not yet started * The LOG service can be blocked due to a slow or unresponsive external disk or NFS server * The OpenSAF service we are trying to access is temporarily out of service For IMM, we can motivate introduction of up to three separate IMM handler threads: * One thread acting as an IMM OI and / or applier. Note that this thread may also need to have its own IMM OM handle. * One thread for IMM OM CCB operations. * One thread for IMM OM read / search operations. These operations are normally fast though, since they should not be blocked by a sync, so we may not need a separate thread for them. But as mentioned above, they may be blocked for other reasons The handler threads should exclusively own their AIS handle(s) and have their own poll + dispatch loops. Communication between the main thread and the handler thread should typically be done through a mailbox. Refer e.g. to tickets [#1531] and [#1527] for issues related to this enhancement, that have been identified in the LOG service. Similar cleanup is needed in other OpenSAF services as well. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1614 amfnd: replace internal use of SaNameT with std::string
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1614] amfnd: replace internal use of SaNameT with std::string ** **Status:** assigned **Milestone:** 5.1.FC **Created:** Mon Nov 30, 2015 06:44 AM UTC by Long HB Nguyen **Last Updated:** Mon Nov 30, 2015 06:45 AM UTC **Owner:** Long HB Nguyen Replace internal use of SaNameT with std::string in amfnd. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1669 cpsv: saCkptCheckpointNumOpeners is not updated after a node restart.
- **Milestone**: 5.0.FC --> 4.7.2 --- ** [tickets:#1669] cpsv: saCkptCheckpointNumOpeners is not updated after a node restart.** **Status:** review **Milestone:** 4.7.2 **Created:** Fri Jan 22, 2016 04:05 AM UTC by Pham Hoang Nhat **Last Updated:** Fri Jan 29, 2016 03:35 AM UTC **Owner:** Pham Hoang Nhat Problem description: saCkptCheckpointNumOpeners is not updated when a node restart. Steps to reproduce the problems are: 1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS) 2. Open this checkpoint on PL4 3. Restart PL3 After step 3. the saCkptCheckpointNumOpeners is not changed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1670 cpsv: Checkpoint is destroyed althought there is a user using it
- **Milestone**: 5.0.FC --> 4.7.2 --- ** [tickets:#1670] cpsv: Checkpoint is destroyed althought there is a user using it** **Status:** review **Milestone:** 4.7.2 **Created:** Fri Jan 22, 2016 04:09 AM UTC by Pham Hoang Nhat **Last Updated:** Fri Jan 29, 2016 03:35 AM UTC **Owner:** Pham Hoang Nhat Problem description: Checkpoint is destroyed althought there is a user using it. Steps to reproduce the problems are: 1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS and retention duration = 0) 2. Open this checkpoint on PL4 3. Restart PL3 After step 3. the checkpoint is destroyed although it was using on PL4. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1795 AMF : haState should be marked QUIESCING in PG callback for shutdown op
- **Milestone**: 5.0.RC2 --> 5.0.1 --- ** [tickets:#1795] AMF : haState should be marked QUIESCING in PG callback for shutdown op** **Status:** assigned **Milestone:** 5.0.1 **Created:** Fri Apr 29, 2016 07:24 AM UTC by Srikanth R **Last Updated:** Mon May 02, 2016 07:21 AM UTC **Owner:** Nagendra Kumar Changeset : 7434 For the shutdown operation on the SI, the haState is filled up with the value SA_AMF_HA_QUIESCED (3), instead of SA_AMF_HA_QUIESCING (4) in the protection group callback. PROTECTION GROUP CALLBACK IS INVOKED error : 1 numberOfMembers : 2 csiName : safCsi=CSI1,safSi=TestApp_SI1,safApp=TestApp_TwoN number of items in notification buffer is 2 {0: {'member': {'haState': 2, 'compName': safComp=COMP1,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 1, 'haReadinessState': 1}, 'change': 1}, 1: {'member': {'haState': **3**, 'compName': safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 2, 'haReadinessState': 1}, 'change': 4}} --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1797 Spare controller failed to become active when both ACTIVE and STANDBY SC rebooted
- **Milestone**: 5.0.RC2 --> 5.0.1 --- ** [tickets:#1797] Spare controller failed to become active when both ACTIVE and STANDBY SC rebooted** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Fri Apr 29, 2016 09:36 AM UTC by Ritu Raj **Last Updated:** Fri Apr 29, 2016 09:36 AM UTC **Owner:** nobody setup: Changeset- 7436 Version - opensaf 5.0 FC * Issue Observed: Spare controller failed to become active when both ACTIVE and STANDBY SC rebooted resulted cluster reset. * Steps To Reproduce: 1. Brought up cluster, where SC-1 took active role SC-2 standby and SC-3 in quiesced state, PL-6 and PL-7 are payloads. 2. Kill any director of active and standby followed by 2 second 3. Observed that quiesced controller failed to took active role and cluster reset happened >> SCALE_SLOT-93:~ # May 2 18:25:33 SCALE_SLOT-93 osafimmnd[1767]: NO Implementer disconnected 5 <0, 2010f> (safAmfService) May 2 18:25:34 SCALE_SLOT-93 osafamfnd[1817]: WA AMF director unexpectedly crashed May 2 18:25:34 SCALE_SLOT-93 osafamfnd[1817]: Rebooting OpenSAF NodeId = 131855 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131855, SupervisionTime = 60 May 2 18:25:34 SCALE_SLOT-93 osafimmnd[1767]: NO Implementer disconnected 17 <0, 2020f> (@safAmfService2020f) May 2 18:25:34 SCALE_SLOT-93 opensaf_reboot: Rebooting local node; timeout=60 May 2 18:25:38 SCALE_SLOT-93 kernel: [273050.885507] md: stopping all md devices. May 2 18:25:38 SCALE_SLOT-93 kernel: [273051.878473] sd 0:0:0:0: [sda] Synchronizing SCSI cache << --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1558 amf: use nullptr instead of NULL macros
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1558] amf: use nullptr instead of NULL macros** **Status:** assigned **Milestone:** 5.1.FC **Created:** Fri Oct 23, 2015 04:28 AM UTC by Long HB Nguyen **Last Updated:** Sun Nov 01, 2015 09:36 PM UTC **Owner:** Long HB Nguyen Using nullptr instead of NULL macros. This is a part of ticket [#1520]. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1585 NTF: Refactor logging long dn notification
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1585] NTF: Refactor logging long dn notification** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Thu Nov 05, 2015 12:51 AM UTC by Minh Hon Chau **Last Updated:** Fri Nov 06, 2015 10:37 AM UTC **Owner:** nobody If NTF logs a notification containing long dn and LOG service doesn't support long dn, the sanamet long dn will be stripped-down Once LOG service supports long dn (#1315), NTF won't have to truncate long DN, only full long dn will be logged. This ticket is about refactoring, to remove the part of code that truncates the long DN --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1571 AMF: Use std::maps instead of Patricia trees
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1571] AMF: Use std::maps instead of Patricia trees** **Status:** assigned **Milestone:** 5.1.FC **Created:** Wed Oct 28, 2015 02:39 AM UTC by Long HB Nguyen **Last Updated:** Sun Nov 01, 2015 09:36 PM UTC **Owner:** Long HB Nguyen Use std::maps instead of Patricia trees, see also [#1520]. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1581 pyosaf: Make log level configurable in the SafLogger utility class
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1581] pyosaf: Make log level configurable in the SafLogger utility class** **Status:** assigned **Milestone:** 5.1.FC **Created:** Mon Nov 02, 2015 03:08 PM UTC by Johan Mårtensson **Last Updated:** Mon Nov 02, 2015 03:08 PM UTC **Owner:** Johan Mårtensson In the SafLogger::log method the log level is hard-coded to notice. This should be fixed so that it's configurable. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1607 Handle AIS error codes properly
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1607] Handle AIS error codes properly** **Status:** assigned **Milestone:** 5.1.FC **Created:** Fri Nov 20, 2015 08:53 AM UTC by Anders Widell **Last Updated:** Fri Nov 20, 2015 08:53 AM UTC **Owner:** Anders Widell There is a flora of AIS error codes defined in saAis.h that an API user is supposed to handle in an appropriate way, but currently, the OpenSAF services themselves do not internally handle these error codes properly. This ticket proposes a general improvement / cleanup of the code where we are (or in moste cases: are *not*) handling AIS error codes in the OpenSAF services. The proposal is also to also add common library helper functions for the AIS eror handling mechanism, to minimize code duplication. Examples of error codes and how to handle them: * SA_AIS_ERR_TRY_AGAIN: Retry the function * SA_AIS_ERR_NO_RESOURCES: Similar to SA_AIS_ERR_TRY_AGAIN * SA_AIS_ERR_TIMEOUT: Retry if the function is idempotent. If the function isn't idempotent, we have to judge from case to case if it should be retried or not. * SA_AIS_ERR_BAD_HANDLE: Initialize a new handle (and possibly also do other things like setting OI implementer in case of an OI handle). Retry with the new handle. In the case of an IMM CCB handle, an incomplete IMM transaction may have to be "replayed". * SA_AIS_ERR_FAILED_OPERATION: When applying an IMM transaction, this code is returned when the transaction was aborted. It can be returned both in the case of a validation error and in the case of a resource error. To distinguish between the two causes, use the new functionality introduced in ticket [#744]. If it was a resource abort, retry by replaying the whole transaction. # For how long should we keep retrying? It is very difficult to set a maximum time limt for how long we need to keep retrying before we give up, as can be seen for example in ticket [#1582]. It is also in many cases difficult to decide what to do when we give up. Sometimes, we can just skip the action and continue anyway. An example of this case would be logging; logging a message is normally not vital to the function of the system. In those cases, we should only retry for a short while (or not at all), and then give up the operation and continue in the same was as if it was successful. However, in many cases the operation cannot be skipped. Restarting the calling process is unlikely to help, since the AIS call is failing because some *other* OpenSAF service (possibly on on another node) is unresponsive. Therefore, the proposal is that in these cases where the operation is vital, we should keep retrying forever and let higher-level monitoring (NID or AMF helathcheck) detect and recover hanging processes. For debugging purp oses, we should however log a message to syslog to indicate where we are stuck in a retry loop. This logging should be by the common helper functions. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1638 Integrate LOG service with CLM
- **Milestone**: future --> 5.1.FC --- ** [tickets:#1638] Integrate LOG service with CLM ** **Status:** assigned **Milestone:** 5.1.FC **Created:** Tue Dec 15, 2015 08:26 AM UTC by Mathi Naickan **Last Updated:** Tue Dec 15, 2015 08:26 AM UTC **Owner:** Mathi Naickan Refer section 3.2 Unavailability of the Log Service API on a Non-Member Node. More details to come. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1640 Integrate IMM service with CLM
- **Milestone**: future --> 5.1.FC --- ** [tickets:#1640] Integrate IMM service with CLM** **Status:** assigned **Milestone:** 5.1.FC **Created:** Tue Dec 15, 2015 08:32 AM UTC by Mathi Naickan **Last Updated:** Tue Dec 15, 2015 08:32 AM UTC **Owner:** Neelakanta Reddy More details TBD --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1684 Incorrect error codes returned in IMM callbacks
- **Milestone**: 5.1.FC --> 5.0.1 --- ** [tickets:#1684] Incorrect error codes returned in IMM callbacks** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon Feb 29, 2016 09:56 AM UTC by elunlen **Last Updated:** Wed May 04, 2016 04:59 PM UTC **Owner:** nobody In the object implementer in the director (imm.cc) function ccb_object_create_cb() returns an invalid return code. Returns SA_AIS_ERR_INVALID_PARAM. This is not according to IMM AIS the return code shpild be SA_AIS_ERR_BAD_OPERATION. In general the return codes for all IMM callbacks should be checked The impact of the problem is confusing error logs --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1655 cpsv: Replica IMM objects are not created after opening a checkpoint
- **Milestone**: 5.0.FC --> 5.0.1 --- ** [tickets:#1655] cpsv: Replica IMM objects are not created after opening a checkpoint** **Status:** review **Milestone:** 5.0.1 **Created:** Tue Jan 05, 2016 08:07 AM UTC by Pham Hoang Nhat **Last Updated:** Wed Jan 06, 2016 06:13 AM UTC **Owner:** Pham Hoang Nhat The replica IMM objects are not created after opening a checkpoint in following scenario: 1. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE 2. Unlink the checkpoint ( the checkpoint is still being used) 3. Open a checkpoint with flag SA_CKPT_CHECKPOINT_CREATE with same name as the on in 1. After 3. although the checkpoint is opened successfully, the replica IMM objects are not created. Measurement: The problem happens because the CPD doesn’t delete relating nodes from ckpt_reploc_tree when unlink the checkpoint in 2. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1656 AMFND: Use std::maps instead of Patricia trees
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1656] AMFND: Use std::maps instead of Patricia trees** **Status:** assigned **Milestone:** 5.1.FC **Labels:** refactoring **Created:** Wed Jan 06, 2016 03:05 AM UTC by Minh Hon Chau **Last Updated:** Wed Jan 27, 2016 06:10 AM UTC **Owner:** Minh Hon Chau As part of #1520 - AMF: Refactoring for 5.0 This ticket is for "Use std::maps instead of Patricia trees" for amfnd --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1658 mds : Opensf transport should adopt the size of the timeout parameter from 32 bits to 64 bits
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1658] mds : Opensf transport should adopt the size of the timeout parameter from 32 bits to 64 bits** **Status:** assigned **Milestone:** 5.1.FC **Created:** Thu Jan 07, 2016 09:13 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Jan 07, 2016 09:17 AM UTC **Owner:** A V Mahesh (AVM) In current implementation their is a mismatch of variable sizes between SAF SaTimeT variable ( 64 bit) and MDS transport implementation (32 bit) timeout variable , the MDS transpot should adopt the size of the timeout parameter from 32 bits to 64 bits. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1659 base: Include time zone in trace log messages
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1659] base: Include time zone in trace log messages** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Thu Jan 07, 2016 09:49 AM UTC by Anders Widell **Last Updated:** Thu Jan 07, 2016 09:49 AM UTC **Owner:** nobody In ticket [#593], we added support for including time zone information in log messages written by the OpenSAF LOG service. To align with this feature, the OpenSAF trace feature should also be able to write time zone information into the trace log files. That will make it easier to correlate LOG messages and trace messages. The proposal is that the trace messages should always contain time zone information in the time stamps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1684 Incorrect error codes returned in IMM callbacks
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1684] Incorrect error codes returned in IMM callbacks** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Mon Feb 29, 2016 09:56 AM UTC by elunlen **Last Updated:** Mon Feb 29, 2016 09:56 AM UTC **Owner:** nobody In the object implementer in the director (imm.cc) function ccb_object_create_cb() returns an invalid return code. Returns SA_AIS_ERR_INVALID_PARAM. This is not according to IMM AIS the return code shpild be SA_AIS_ERR_BAD_OPERATION. In general the return codes for all IMM callbacks should be checked The impact of the problem is confusing error logs --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1673 amfa: Divide amf api functions in a thin C layer and use C++ for implementation
- **Milestone**: 5.0.FC --> 5.1.FC --- ** [tickets:#1673] amfa: Divide amf api functions in a thin C layer and use C++ for implementation** **Status:** review **Milestone:** 5.1.FC **Created:** Fri Jan 29, 2016 01:33 PM UTC by Hans Nordebäck **Last Updated:** Fri Jan 29, 2016 02:46 PM UTC **Owner:** Hans Nordebäck The amf agent part is implemented in C and shares C data structures with amfd and amfnd, e.g. libs/common/osaf. It would be benficial if the amf agent could be split into two parts, one thin C layer that passes the call to C++ for the implementation. Then it will be significally easier to convert libs/common/osaf structures and functions to C++ and remove e.g. the use of SaNameT and support long DN. The C++ usage at the agent could be limited to follow C++98 standard to ease acceptance for applications. A patch is sent out with this change and comments are appreciated. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1747 IMMND trying to start PBE process while stopping OpenSAF services
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1747] IMMND trying to start PBE process while stopping OpenSAF services** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon Apr 11, 2016 10:30 AM UTC by Chani Srivastava **Last Updated:** Sun Apr 24, 2016 08:36 PM UTC **Owner:** nobody Setup: Changeset- 7436 Version - opensaf 5.0 1-PBE enabled Issue is not observed always. Apr 11 13:32:52 OSAF-SC1 opensafd: Stopping OpenSAF Services Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Shutdown initiated Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Terminating all AMF components Apr 11 13:32:52 OSAF-SC1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle Apr 11 13:32:52 OSAF-SC1 osafimmpbed: IN IMM PBE process EXITING... Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 18 <545, 2010f> (OpenSafImmPBE) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 18 <545, 2010f> (OpenSafImmPBE) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: WA Persistent back-end process has apparently died. Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO STARTING PBE process. Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO pbe-db-file-path:/home/chani/immPBE/imm.db VETERAN:1 B:0 Apr 11 13:32:53 OSAF-SC1 osafckptnd[30049]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafsmfd[29976]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaflckd[30057]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2412 <321, 2010f> (safLckService) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 2412 <321, 2010f> (safLckService) Apr 11 13:32:53 OSAF-SC1 osaflcknd[30032]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafclmna[29860]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmd[29888]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaffmd[29878]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafrded[29869]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafevtd[30088]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2413 <315, 2010f> (safEvtService) Apr 11 13:32:53 OSAF-SC1 osafckptd[30097]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2411 <330, 2010f> (safCheckPointService) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafmsgd[30011]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafmsgnd[29995]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafsmfnd[29978]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaflogd[29914]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafntfimcnd[5780]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Apr 11 13:32:53 OSAF-SC1 osafclmd[29940]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[0] == '/usr/lib64/opensaf/osafimmpbed' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[1] == '--recover' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[2] == '--pbe' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[3] == '/home/chani/immPBE/imm.db' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: ER osafimmpbe is not started by osafimmnd --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1721 AMF: Lock-in healthy SU set SU's Oper state as DISABLED
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1721] AMF: Lock-in healthy SU set SU's Oper state as DISABLED** **Status:** review **Milestone:** 5.0.1 **Created:** Wed Apr 06, 2016 01:08 AM UTC by Minh Hon Chau **Last Updated:** Sun Apr 24, 2016 08:37 PM UTC **Owner:** Minh Hon Chau If lock-in SU currently amfnd set SU Oper's state as DISABLED. It seems not to be a right setting since SU has been *healthy* state. The problem only exposes when SC recovers from headless, amfnd will report locked-in SU with its Operstate as DISABLED. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1793 osaf/amf: Handle case when config is for 3 SC, but 3rd node is started as payload
- **Type**: defect --> enhancement - **Milestone**: future --> 5.1.FC --- ** [tickets:#1793] osaf/amf: Handle case when config is for 3 SC, but 3rd node is started as payload** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Thu Apr 28, 2016 08:06 PM UTC by Mathi Naickan **Last Updated:** Thu Apr 28, 2016 08:14 PM UTC **Owner:** nobody --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1802 amf upgrade fails
- **status**: review --> needinfo - **Milestone**: 5.0.GA --> 5.0.1 - **Comment**: This ticket needs to be updated with additional information of how to reproduce, traces etc. Please update with the relevant information. --- ** [tickets:#1802] amf upgrade fails** **Status:** needinfo **Milestone:** 5.0.1 **Created:** Mon May 02, 2016 05:44 PM UTC by Rafael **Last Updated:** Tue May 03, 2016 04:53 PM UTC **Owner:** Nagendra Kumar When running an SMF upgrade of AMF the following happens: SC-2-1 osafsmfd[4296]: NO STEP: Online installation of new software (this step is succesful) SC-2-1 osafsmfd[4296]: NO STEP: Lock deactivation units SC-2-1 osafamfd[4120]: NO Node is locked, no SI unassigned alarm will be sent SC-2-1 osafsmfd[4296]: NO STEP: Terminate deactivation units SC-2-1 osafamfd[4120]: WA avd_msg_sanity_chk: invalid msg id 219, msg type 8, from 2020f should be 218 ... (this repeats a number of times) osafsmfd[4296]: NO Fail to invoke admin operation, rc=SA_AIS_ERR_TIMEOUT (5). dn=[safAmfNode=SC-2,safAmfCluster=myAmfCluster], opId=[3] osafsmfd[4296]: NO Failed to call admin operation 3 on safAmfNode=SC-2,safAmfCluster=myAmfCluster Is AMF handling a upgrade properly? Seems that the msg id is missmatched here. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1720 AMF: RT attribute values are read as dummy after headless
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1720] AMF: RT attribute values are read as dummy after headless** **Status:** review **Milestone:** 5.0.1 **Created:** Wed Apr 06, 2016 01:02 AM UTC by Minh Hon Chau **Last Updated:** Sun Apr 24, 2016 08:38 PM UTC **Owner:** Minh Hon Chau After headless, some of RTA values are retrieved as dummy at the time AMFD initiate role. The reason is these RTA are read before AMFD set itself as implementer. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1742 amfd: reject admin ops during headless recovery
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1742] amfd: reject admin ops during headless recovery** **Status:** accepted **Milestone:** 5.0.1 **Created:** Fri Apr 08, 2016 01:48 AM UTC by Gary Lee **Last Updated:** Sun Apr 24, 2016 08:36 PM UTC **Owner:** Gary Lee During the headless recovery period (waiting for node_ups from payloads / adjusting assignments), admin ops should be rejected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1722 AMF: Downgrade severity of logging in case data_update from absent amfnd
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1722] AMF: Downgrade severity of logging in case data_update from absent amfnd** **Status:** review **Milestone:** 5.0.1 **Created:** Wed Apr 06, 2016 01:23 AM UTC by Minh Hon Chau **Last Updated:** Sun Apr 24, 2016 08:36 PM UTC **Owner:** Minh Hon Chau While amfd is performing *sync* with all amfnd(s), there should be component/su error (for instance a restart error) occuring during sync period. Amfnd(s) sends data_update msg(s) to increase the error counter. These data_update will be ignored at amfd since amfnd is still absent due to on-going sync. Amfnd(s) will resend these update messages after sync completes. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1723 amf: send extra state information from nd to d after headless state
- **Milestone**: future --> 5.0.1 --- ** [tickets:#1723] amf: send extra state information from nd to d after headless state** **Status:** review **Milestone:** 5.0.1 **Created:** Wed Apr 06, 2016 01:52 AM UTC by Gary Lee **Last Updated:** Sun Apr 24, 2016 08:35 PM UTC **Owner:** Gary Lee To fully recover after a headless state, the AMF director needs to know the current assignment state of each SUSI on the payloads. Currently, it is assumed to be in the assigned state which is incorrect. This scope of this ticket is to make this information available at the director. Further changes are required to handle the various assignment states. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1784 Amfd asserts on clm locked controller after successfully taking active role as a part of failover
- **Milestone**: 5.0.RC2 --> 5.0.1 --- ** [tickets:#1784] Amfd asserts on clm locked controller after successfully taking active role as a part of failover** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Tue Apr 26, 2016 11:47 AM UTC by Ritu Raj **Last Updated:** Thu Apr 28, 2016 10:11 AM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/1784/attachment/messages) (3.2 MB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/1784/attachment/osafamfd) (7.4 MB; application/octet-stream) setup: Changeset- 7436 Version - opensaf 5.0 FC * Issue Observed : Amfd asserts on clm locked controller after successfully taking active role as a part of failover. * Steps To Reproduce: 1. OpenSAF running on 4 nodes, where SC-1 is Active , SC-2 Standby and PL-3 and PL-4 are payloads. 2. Performed CLM lock of stanby controller (SC-2), 3. Now, perform failover on active controller(SC-1) 4. Observed that amfd asserted on clm locked controller(SC-2) and cluster reset happened >SLOT-2:~ # Apr 26 14:56:06 SLOT-2 osafimmd[2199]: WA IMMD lost contact with >peer IMMD (NCSMDS_RED_DOWN) ... Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Node Down event for node id 2010f: Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Current role: STANDBY ... Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Peer down on node 0x2010f Apr 26 14:56:11 SLOT-2 osafimmd[2199]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover ... Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Controller Failover: Setting role to ACTIVE Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO RDE role set to ACTIVE Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Running '/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s) Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osaflogd[2224]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafntfd[2234]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafclmd[2244]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO AVD NEW_ACTIVE, adest:1 Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ellect_coord invoke from rda_callback ACTIVE Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO New coord elected, resides at 2020f Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO 2PBE configured, IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO This IMMND is now the NEW Coord Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO SETTING COORD TO 1 CLOUD PROTO Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer disconnected 16 <139, 2020f> (@safAmfService2020f) Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 18 (safLogService) <126, 2020f> Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 19 (safAmfService) <139, 2020f> Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO Node 'SC-1' left the cluster Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active DONE! Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Apr 26 14:56:11 SLOT-2 osafntfimcnd[2419]: NO exiting on signal 15 .. Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 27 (safSmfService) <337, 2020f> Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Apr 26 14:56:11 SLOT-2 osafamfd[2254]: ER Wrong rootCauseEntity �H� Apr 26 14:56:11 SLOT-2 osafamfd[2254]: clm.cc:312: clm_track_cb: Assertion '0' failed. Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: WA AMF director unexpectedly crashed Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting local node; timeout=60 * Syslog and amfd trace attached Note: The issue is observed randomly --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lis
[tickets] [opensaf:tickets] #1800 AMF : Proxied should be brought down initially during NG lock-in admin op
- **Milestone**: 5.0.RC2 --> 5.0.1 --- ** [tickets:#1800] AMF : Proxied should be brought down initially during NG lock-in admin op** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Sat Apr 30, 2016 07:50 AM UTC by Srikanth R **Last Updated:** Mon May 02, 2016 04:58 AM UTC **Owner:** nobody Changeset : 7436 Setup : 2n Red model with proxy and proxied SU hosted on same node. During lock-in operation of node group, initially proxied SU should be brought down ( .i.e, component termination callback should be sent for proxied ) and later proxy SU should be brought down. But in the current implementation, proxy SU is brought down initially and later proxied SU is tried to be brought down , which got failed as there is no proxy. 436 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op "LOCK_INSTANTIATION" initiated for 'safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster', invocation: 300647710721" 437 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 438 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNode=SC-1,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 439 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNode=SC-2,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 440 05:30:00 01/01/1970 NO safApp=safAmfService "safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N ProxyStatus is now UNPROXIED" 441 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXY_SU1_2N,safSg=PROXY_SG_2N,safApp=PROXY_2N PresenceState TERMINATING => UNINSTANTIATED" 442 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N PresenceState TERMINATING => UNINSTANTIATED" 443 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N OperState ENABLED => DISABLED" 444 05:30:00 01/01/1970 NO safApp=safAmfService "Autorepair not done for 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'" 445 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op done for invocation: 300647710721, result 1" --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1799 AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy)
- **Milestone**: 5.0.RC2 --> 5.0.1 --- ** [tickets:#1799] AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy)** **Status:** assigned **Milestone:** 5.0.1 **Created:** Sat Apr 30, 2016 06:17 AM UTC by Srikanth R **Last Updated:** Tue May 03, 2016 06:05 AM UTC **Owner:** Nagendra Kumar Changeset : 7436 Setup :2N redmodel with both proxy and proxied hosted on the same node. * Initially the proxy and proxied are in fully assigned state. * Now perform lock operation on proxy SU, for which quiesced callback and csi removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL and csiName is populated as NULL. But the proxy component is having active assignments , which is to be removed according to the callback . Similar is for lock operation is on proxied SU. So expectation is that for lock operation on either proxy / proxied SU ,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE with the corresponding csi. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1788 cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST after headless state
- **Milestone**: 5.1.FC --> 5.0.1 --- ** [tickets:#1788] cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST after headless state** **Status:** accepted **Milestone:** 5.0.1 **Created:** Thu Apr 28, 2016 02:20 AM UTC by Pham Hoang Nhat **Last Updated:** Wed May 04, 2016 02:46 AM UTC **Owner:** Pham Hoang Nhat The problem happened in the following scenario: 1. Application calls saCkptCheckpointOpen() to create a collocated checkpoint on SC-2. Replica of the checkpoint on SC-2 is active 2. Application calls saCkptCheckpointOpen() to open a collocated checkpoint on PL-5. 3. Application creates section and accesses the checkpoint on PL-5. 4. Both SCs are down. 5. Both SCs are up again. 6. Application accesses the checkpoint with saCkptCheckpointWrite(). The fault code SA_AIS_ERR_NOT_EXIST is return. This problem happened because the osafckptnd process ID on SC-2 before headless and after headless are same. This leads their MDS destination are same. Thus when the SC-2 is up and in short time when CPD hadn't been assigned a new active replica, the application send checkpoint access request to CPND on SC-2 which no longer hosts the active replica. Then it returns SA_AIS_ERR_NOT_EXIST. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1805 log: Deadlock in log agent makes client hang
- **status**: accepted --> review --- ** [tickets:#1805] log: Deadlock in log agent makes client hang** **Status:** review **Milestone:** 5.0.GA **Created:** Tue May 03, 2016 02:30 PM UTC by elunlen **Last Updated:** Tue May 03, 2016 02:30 PM UTC **Owner:** elunlen In the log agent uses a lga_cb.cb_lock mutex to protect data in the global lga structure. This mutex is locked in the lga_recover_one_client() function. In the client thread, when this mutex is locked an MDS api (lga_mds_msg_sync_send) is called. This api will try to lock the MDS internal gl_mds_library_mutex which may already be locked. Both mutexes are also used in the MDS thread --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1728 amfnd: message ID mismatch causing node reboot
changeset: 7591:8b7ff9e7ee50 branch: opensaf-5.0.x parent: 7583:34aa67561642 user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:50:00 2016 +0530 summary: amfnd: revert changeset:7541:9cf09e277bd8 - ticket [#1728] changeset: 7592:149ab1204a06 branch: opensaf-5.0.x user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:51:12 2016 +0530 summary: amfnd: revert changeset:7507:fcd28ac3f6d2 - ticket [#1728] changeset: 7593:946cc9884652 branch: opensaf-5.0.x user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:51:57 2016 +0530 summary: amfnd: revert changeset:7466:71b86f3732ba - ticket [#1728] changeset: 7594:19e4be9f0101 branch: opensaf-5.0.x user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:52:33 2016 +0530 summary: amfnd: revert changeset:7412:f2b5abde4d71 - ticket [#517] changeset: 7595:278911431815 branch: opensaf-5.0.x user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:53:08 2016 +0530 summary: amfnd: revert changeset:7365:323fdb59a154 - ticket [#517] changeset: 7596:2941ea4c6fb3 branch: opensaf-5.0.x tag: tip user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:55:01 2016 +0530 summary: amfnd: revert changeset:7356:4ee6e611100a - ticket [#517] changeset: 7585:627c6844cb20 user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:36:22 2016 +0530 summary: amfnd: revert changeset:7541:9cf09e277bd8 - ticket [#1728] changeset: 7586:911d3b83d551 user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:38:07 2016 +0530 summary: amfnd: revert changeset:7507:fcd28ac3f6d2 - ticket [#1728] changeset: 7587:1479c419e520 user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:39:23 2016 +0530 summary: amfnd: revert changeset:7466:71b86f3732ba - ticket [#1728] changeset: 7588:dcf5c3c090fa user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:40:43 2016 +0530 summary: amfnd: revert changeset:7412:f2b5abde4d71 - ticket [#517] changeset: 7589:ede9089137bc user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:41:44 2016 +0530 summary: amfnd: revert changeset:7365:323fdb59a154 - ticket [#517] changeset: 7590:62b34f2f4289 tag: tip user: Mathivanan N.P. mathi.naic...@oracle.com date: Wed May 04 15:43:50 2016 +0530 summary: amfnd: revert changeset:7356:4ee6e611100a - ticket [#517] --- ** [tickets:#1728] amfnd: message ID mismatch causing node reboot** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 08:39 AM UTC by Gary Lee **Last Updated:** Wed May 04, 2016 09:51 AM UTC **Owner:** Nagendra Kumar "Message ID mismatch, rec %u, expected %u" can sometimes be seen on a node. This seems to be caused by the addition of the IMM reader thread. In various message handlers, rcv_msg_id is recorded: avnd_msgid_assert(info->msg_id); cb->rcv_msg_id = info->msg_id; But now we potentially have the main thread and IMM reader thread updating this concurrently. The update of rcv_msg_id should possibly be done in a single thread. Eg. in avnd_mds_rcv() when a message is received from AMFD. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
changeset: 7591:8b7ff9e7ee50 branch: opensaf-5.0.x parent: 7583:34aa67561642 user:Mathivanan N.P. date:Wed May 04 15:50:00 2016 +0530 summary: amfnd: revert changeset:7541:9cf09e277bd8 - ticket [#1728] changeset: 7592:149ab1204a06 branch: opensaf-5.0.x user:Mathivanan N.P. date:Wed May 04 15:51:12 2016 +0530 summary: amfnd: revert changeset:7507:fcd28ac3f6d2 - ticket [#1728] changeset: 7593:946cc9884652 branch: opensaf-5.0.x user:Mathivanan N.P. date:Wed May 04 15:51:57 2016 +0530 summary: amfnd: revert changeset:7466:71b86f3732ba - ticket [#1728] changeset: 7594:19e4be9f0101 branch: opensaf-5.0.x user:Mathivanan N.P. date:Wed May 04 15:52:33 2016 +0530 summary: amfnd: revert changeset:7412:f2b5abde4d71 - ticket [#517] changeset: 7595:278911431815 branch: opensaf-5.0.x user:Mathivanan N.P. date:Wed May 04 15:53:08 2016 +0530 summary: amfnd: revert changeset:7365:323fdb59a154 - ticket [#517] changeset: 7596:2941ea4c6fb3 branch: opensaf-5.0.x tag: tip user:Mathivanan N.P. date:Wed May 04 15:55:01 2016 +0530 summary: amfnd: revert changeset:7356:4ee6e611100a - ticket [#517] changeset: 7585:627c6844cb20 user:Mathivanan N.P. date:Wed May 04 15:36:22 2016 +0530 summary: amfnd: revert changeset:7541:9cf09e277bd8 - ticket [#1728] changeset: 7586:911d3b83d551 user:Mathivanan N.P. date:Wed May 04 15:38:07 2016 +0530 summary: amfnd: revert changeset:7507:fcd28ac3f6d2 - ticket [#1728] changeset: 7587:1479c419e520 user:Mathivanan N.P. date:Wed May 04 15:39:23 2016 +0530 summary: amfnd: revert changeset:7466:71b86f3732ba - ticket [#1728] changeset: 7588:dcf5c3c090fa user:Mathivanan N.P. date:Wed May 04 15:40:43 2016 +0530 summary: amfnd: revert changeset:7412:f2b5abde4d71 - ticket [#517] changeset: 7589:ede9089137bc user:Mathivanan N.P. date:Wed May 04 15:41:44 2016 +0530 summary: amfnd: revert changeset:7365:323fdb59a154 - ticket [#517] changeset: 7590:62b34f2f4289 tag: tip user:Mathivanan N.P. date:Wed May 04 15:43:50 2016 +0530 summary: amfnd: revert changeset:7356:4ee6e611100a - ticket [#517] --- ** [tickets:#517] Amfnd: coredumps when calling immutil_saImmOmInitialize** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Wed Jul 24, 2013 08:29 AM UTC by Hans Nordebäck **Last Updated:** Wed May 04, 2016 10:18 AM UTC **Owner:** Nagendra Kumar avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors. There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd. (gdb) bt #0 0x7ffac238cb35 in raise () from /lib64/libc.so.6 #1 0x7ffac238e111 in abort () from /lib64/libc.so.6 #2 0x004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70 #3 0x004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126 #4 0x00422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743 #5 0x00436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236 #6 0x0042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278 #7 avnd_main_process () at avnd_proc.c:219 #8 0x00408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
changeset: 7588:dcf5c3c090fa user:Mathivanan N.P. date:Wed May 04 15:40:43 2016 +0530 summary: amfnd: revert changeset:7412:f2b5abde4d71 - ticket [#517] changeset: 7589:ede9089137bc user:Mathivanan N.P. date:Wed May 04 15:41:44 2016 +0530 summary: amfnd: revert changeset:7365:323fdb59a154 - ticket [#517] changeset: 7590:62b34f2f4289 tag: tip user:Mathivanan N.P. date:Wed May 04 15:43:50 2016 +0530 summary: amfnd: revert changeset:7356:4ee6e611100a - ticket [#517] --- ** [tickets:#517] Amfnd: coredumps when calling immutil_saImmOmInitialize** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Wed Jul 24, 2013 08:29 AM UTC by Hans Nordebäck **Last Updated:** Wed May 04, 2016 09:51 AM UTC **Owner:** Nagendra Kumar avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors. There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd. (gdb) bt #0 0x7ffac238cb35 in raise () from /lib64/libc.so.6 #1 0x7ffac238e111 in abort () from /lib64/libc.so.6 #2 0x004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70 #3 0x004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126 #4 0x00422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743 #5 0x00436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236 #6 0x0042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278 #7 avnd_main_process () at avnd_proc.c:219 #8 0x00408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #517 Amfnd: coredumps when calling immutil_saImmOmInitialize
- **Milestone**: future --> 5.1.FC - **Comment**: Reverted the changes from 5.0. --- ** [tickets:#517] Amfnd: coredumps when calling immutil_saImmOmInitialize** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Wed Jul 24, 2013 08:29 AM UTC by Hans Nordebäck **Last Updated:** Mon Mar 28, 2016 01:45 PM UTC **Owner:** Nagendra Kumar avnd_comp_config_get_su calls immutil_saImmOmInitialize and ImmutilWrapperProfile is not set by avnd so default values are used and immutil calls abort at errors. There are two problems here, immutil should not do abort and the other problem is that immutil do sleeps in try again loops. Avnd is event based and immutil is configurable but errors and try-again logic has to be managed by avnd. (gdb) bt #0 0x7ffac238cb35 in raise () from /lib64/libc.so.6 #1 0x7ffac238e111 in abort () from /lib64/libc.so.6 #2 0x004051e8 in defaultImmutilError (fmt=0x43fc70 "saImmOmInitialize FAILED, rc = %d") at ../../../../../osaf/tools/safimm/src/immutil.c:70 #3 0x004065e4 in immutil_saImmOmInitialize (immHandle=0x7fff7098d380, immCallbacks=0x0, version=0x7fff7098d3a0) at ../../../../../osaf/tools/safimm/src/immutil.c:1126 #4 0x00422551 in avnd_comp_config_get_su (su=0x66d8d0) at avnd_compdb.c:1743 #5 0x00436560 in avnd_evt_avd_su_pres_evh (cb=0x6578c0, evt=) at avnd_susm.c:1236 #6 0x0042ffe0 in avnd_evt_process (evt=) at avnd_proc.c:278 #7 avnd_main_process () at avnd_proc.c:219 #8 0x00408805 in main (argc=1, argv=0x7fff7098d578) at amfnd_main.c:71 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1728 amfnd: message ID mismatch causing node reboot
- **status**: fixed --> unassigned - **Milestone**: 5.0.GA --> 5.1.FC - **Comment**: Reverted the changes from 5.0. --- ** [tickets:#1728] amfnd: message ID mismatch causing node reboot** **Status:** unassigned **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 08:39 AM UTC by Gary Lee **Last Updated:** Mon Apr 25, 2016 09:51 AM UTC **Owner:** Nagendra Kumar "Message ID mismatch, rec %u, expected %u" can sometimes be seen on a node. This seems to be caused by the addition of the IMM reader thread. In various message handlers, rcv_msg_id is recorded: avnd_msgid_assert(info->msg_id); cb->rcv_msg_id = info->msg_id; But now we potentially have the main thread and IMM reader thread updating this concurrently. The update of rcv_msg_id should possibly be done in a single thread. Eg. in avnd_mds_rcv() when a message is received from AMFD. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1789 log: not verify the logBufSize caused the node malfunctioned
- **status**: accepted --> review --- ** [tickets:#1789] log: not verify the logBufSize caused the node malfunctioned** **Status:** review **Milestone:** 4.6.2 **Created:** Thu Apr 28, 2016 11:06 AM UTC by Vu Minh Nguyen **Last Updated:** Wed May 04, 2016 06:30 AM UTC **Owner:** Vu Minh Nguyen Normally, the log clients pass `logBufSize` value which is calculated based on the data in `logBuf` to `safLogWriteLogAsync()` LOG API. But when accidentally passing an invalid value of `logBufSize` to `saLogWriteLogAsync()` such as a very large number which is caused by not using `strlen()` on `logBuf`, it will cause a lot of troubles: E.g: 1) The safLog will be flooded by zero and has very big size (e.g 4GB) 2) Eat lots of RAM 3) Consume much CPU 4) Other things --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets