[tickets] [opensaf:tickets] #2296 imm: IMMND on payload crashes after SC absence
- **status**: accepted --> review --- ** [tickets:#2296] imm: IMMND on payload crashes after SC absence** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Feb 09, 2017 08:44 AM UTC by Hung Nguyen **Last Updated:** Thu Feb 09, 2017 08:44 AM UTC **Owner:** Hung Nguyen **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2296/attachment/logs.tgz) (5.2 MB; application/x-compressed) Removal of IMMND coordinator was introduced in [#1692]. Some cleanup actions are delayed until **immnd_proc_server()** is executed. In case the cluster is back from headless too fast, **immnd_proc_server()** will not be executed and IMMND will crashes later. ~~~ 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting 2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1' 2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state - fevs replies pending:16 fevs highest processed:13154 2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD service is DOWN 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin owner with id:41 when imm is not writable state 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f sv_id:27 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 2050f> (OpenSafImmPBE) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 2010f(down)> (safLogService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 2010f(down)> (@safLogService_appl) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 2010f(down)> (@OpenSafImmReplicatorA) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 2010f(down)> (safClmService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 2010f(down)> (safAmfService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ... 2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with MDS 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH MDS 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 2050f01e8 already exist 2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for cli_hdl:1464583980303 2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1' 2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed 2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message type 17 - ignoring 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... ScAbsenseAllowed?:900 introduced?:2 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 does not exist 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 (OpenSafImmPBE) <0, 2040f> 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13157 highestReceived:13158 2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot accept start of sync, will terminate ~~~ IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and crashed. ~~~ #0 0x7f23733bdc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = 0 pid = 406 selftid = 406 #1 0x7f23733c1028 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x152d0009, sa_sigaction = 0x152d0009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0} sigs = {__val = {32, 0 }} #2 0x555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, isJoining=false) at
[tickets] [opensaf:tickets] #2283 ckpt: cpnd_evt_proc_ckpt_open does not reply to ckpt agent
- **status**: review --> fixed - **Comment**: opensaf-5.0.x: changeset: 8568:65fe8f2d1d07 branch: opensaf-5.0.x parent: 8563:0e852f85dab8 user:Zoran Milinkovicdate:Tue Jan 31 13:51:50 2017 +0100 summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in checkpoint open call [#2283] - opensaf-5.1.x: changeset: 8569:dbb97b94a174 branch: opensaf-5.1.x tag: tip parent: 8564:50c827d26599 user:Zoran Milinkovic date:Tue Jan 31 13:51:50 2017 +0100 summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in checkpoint open call [#2283] - default(5.2): changeset: 8567:fb6aea5fe1c9 user:Zoran Milinkovic date:Tue Jan 31 13:51:50 2017 +0100 summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in checkpoint open call [#2283] --- ** [tickets:#2283] ckpt: cpnd_evt_proc_ckpt_open does not reply to ckpt agent** **Status:** fixed **Milestone:** 5.0.2 **Created:** Tue Jan 31, 2017 12:41 PM UTC by Zoran Milinkovic **Last Updated:** Tue Jan 31, 2017 01:05 PM UTC **Owner:** Zoran Milinkovic If a client node is not found in cpnd_evt_proc_ckpt_open(), the function does not reply to the agent. This can cause that checkpoint open may hang for some time and return SA_AIS_ERR_TIMEOUT if the client node is not found. According to the spec, if the checkpoint handle is invalid, the API function should reply with SA_AIS_ERR_BAD_HANDLE. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2298 build: Old application binaries may fail to load
--- ** [tickets:#2298] build: Old application binaries may fail to load** **Status:** accepted **Milestone:** 5.2.FC **Created:** Thu Feb 09, 2017 01:56 PM UTC by Anders Widell **Last Updated:** Thu Feb 09, 2017 01:56 PM UTC **Owner:** Anders Widell An application binary linked with the OpenSAF AIS libraries may fail to load after upgrading to OpenSAF 5.2. The reason is that libopensaf_core.so.0 has moved from /usr/local/lib to /usr/local/lib/opensaf. This problem can happen if the following two conditions are met: * The application was linked without using the -Wl,--as-needed option (and this option is not enabled by default by the Linux distribution used when building the application binary) * The directory /usr/local/lib/opensaf is not listed in LD_LIBRARY_PATH or /etc/ld.so.conf The suggested fix is to: * Move libopensaf_core back to /usr/local/lib * Update the documentation to mention that -Wl,--as-needed must to be used when linking with the OpenSAF libraries, to avoid similar problems in the future. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2297 mds: improve MDS logging
- **status**: accepted --> review - **Comment**: https://sourceforge.net/p/opensaf/mailman/message/35656997/ --- ** [tickets:#2297] mds: improve MDS logging** **Status:** review **Milestone:** 5.2.FC **Created:** Thu Feb 09, 2017 01:15 PM UTC by Zoran Milinkovic **Last Updated:** Thu Feb 09, 2017 01:15 PM UTC **Owner:** Zoran Milinkovic Example: Nov 17 18:11:44.259636 osafamfd[500] ERR |MDS_SND_RCV: Timeout or Error occured Nov 17 18:11:44.259779 osafamfd[500] ERR |MDS_SND_RCV: Timeout occured on sndrsp message Nov 17 18:11:44.259817 osafamfd[500] ERR |MDS_SND_RCV: Adest=<0x0002010f,439> In the first log, it's not obvious if MDS send failed because of timeout or because of error. If MDS send fails beacuse of error, then the second message will still write that it's because of the timeout, which is not correct. The first message should distinguish between timeout or error cases. In error case, errno can be printed to make ease debugging. The second message should be changed from "Timeout occured." to "Timeout or error occured." --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2297 mds: improve MDS logging
--- ** [tickets:#2297] mds: improve MDS logging** **Status:** accepted **Milestone:** 5.2.FC **Created:** Thu Feb 09, 2017 01:15 PM UTC by Zoran Milinkovic **Last Updated:** Thu Feb 09, 2017 01:15 PM UTC **Owner:** Zoran Milinkovic Example: Nov 17 18:11:44.259636 osafamfd[500] ERR |MDS_SND_RCV: Timeout or Error occured Nov 17 18:11:44.259779 osafamfd[500] ERR |MDS_SND_RCV: Timeout occured on sndrsp message Nov 17 18:11:44.259817 osafamfd[500] ERR |MDS_SND_RCV: Adest=<0x0002010f,439> In the first log, it's not obvious if MDS send failed because of timeout or because of error. If MDS send fails beacuse of error, then the second message will still write that it's because of the timeout, which is not correct. The first message should distinguish between timeout or error cases. In error case, errno can be printed to make ease debugging. The second message should be changed from "Timeout occured." to "Timeout or error occured." --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2290 mds: (TCP) Libraries cause high CPU load when opensaf service stops
- **status**: review --> fixed - **Comment**: changeset: 8566:7531f6abf2cf tag: tip user:Hans Nordebackdate:Thu Feb 09 13:03:30 2017 +0100 summary: mds: TCP Libraries cause high CPU load when opensaf service stops [#2290] --- ** [tickets:#2290] mds: (TCP) Libraries cause high CPU load when opensaf service stops** **Status:** fixed **Milestone:** 5.2.FC **Created:** Tue Feb 07, 2017 11:03 AM UTC by Hung Nguyen **Last Updated:** Wed Feb 08, 2017 12:34 PM UTC **Owner:** Hans Nordebäck When DBSRsock is closed and it returns from mdtm_process_poll_recv_data_tcp() ~~~ :::c syslog(LOG_ERR, "MDTM:SOCKET recd_bytes :%zd, conn lost with dh server", recd_bytes); close(tcp_cb->DBSRsock); return; ~~~ the while() loops rapidly because the poll() returns **1** and pfd[0].revents is **32 (POLLNVAL 0x020)** ~~~ :::c pfd[0].fd = tcp_cb->DBSRsock; pfd[1].fd = tcp_cb->tmr_fd; while (1) { int pollres; pfd[0].events = POLLIN; pfd[1].events = POLLIN; pfd[0].revents = pfd[1].revents = 0; pollres = poll(pfd, 2, MDTM_TCP_POLL_TIMEOUT); ... } ~~~ - Reproduce steps: * run immcfg ~~~ root@SC-1:~# immcfg > ~~~ * stop opensaf service ~~~ root@SC-1:~# service opensafd stop ~~~ * check the CPU --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1749 amf: Incorrect ER in syslog
- **status**: accepted --> review --- ** [tickets:#1749] amf: Incorrect ER in syslog** **Status:** review **Milestone:** 5.2.FC **Created:** Tue Apr 12, 2016 12:22 PM UTC by elunlen **Last Updated:** Tue Feb 07, 2017 10:06 AM UTC **Owner:** Nagendra Kumar When requesting AMF to do a SI swap the following message may appear in the syslog: 2016-04-11 17:35:37 SC-1 osafamfd[500]: ER safSi=SC-2N,safApp=OpenSAF SWAP failed - Cold sync in progress This is the case also if the response to the operation is SA_AIS_ERR_TRY_AGAIN or SA_AIS_ERR_BUSY. Getting these responses is not error responses and should not result in an ER message in the syslog --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out
Also, please share mds.log and complete syslog. --- ** [tickets:#2278] mds: Blocking send causes AMF health check time-out** **Status:** assigned **Milestone:** 5.1.1 **Created:** Thu Jan 26, 2017 09:49 AM UTC by Anders Widell **Last Updated:** Thu Feb 09, 2017 08:31 AM UTC **Owner:** A V Mahesh (AVM) AMF health-check time-out is seen on SC-1 after restarting SC-2. The system is using OpenSAF 5.1.0 configured with TCP communication. Syslog: ~~~ 2017-01-20T18:29:04.405982+01:00 local0.err SC-1 osafamfnd[2820]: ER AMF director heart beat timeout, generating core for amfd 2017-01-20T18:29:05.408819+01:00 local0.crit SC-1 osafamfnd[2820]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, SupervisionTime = 0 ~~~ Back-trace of osafamfd: ~~~ 0x7fa316cceb60 osaf_poll_no_timeout (osaf/libs/core/common/osaf_poll.c:33) 0x7fa316ccede5 osaf_poll (osaf/libs/core/common/osaf_poll.c:45) 0x7fa316ccee25 osaf_poll_one_fd (osaf/libs/core/common/osaf_poll.c:129) 0x7fa316cfab67 mds_mcm_time_wait (osaf/libs/core/common/include/osaf_utility.h:79) 0x7fa316cfae51 mds_subtn_tbl_add_disc_queue (osaf/libs/core/mds/mds_c_sndrcv.c:1808) 0x7fa316cfb03d mds_mcm_process_disc_queue_checks_redundant (osaf/libs/core/mds/mds_c_sndrcv.c:2338) 0x7fa316cfbcd1 mcm_pvt_red_snd_process_common (osaf/libs/core/mds/mds_c_sndrcv.c:2257) 0x7fa316cfd04d mcm_pvt_red_svc_snd (osaf/libs/core/mds/mds_c_sndrcv.c:2174) 0x7fa316cff8f9 mds_send (osaf/libs/core/mds/mds_c_sndrcv.c:736) 0x7fa316cf9068 ncsmds_api (osaf/libs/core/mds/mds_papi.c:191) 0x7fa316ce6f5f mbcsv_mds_send_msg (osaf/libs/core/mbcsv/mbcsv_mds.c:239) 0x7fa316cec440 mbcsv_send_ckpt_data_to_all_peers (osaf/libs/core/mbcsv/mbcsv_util.c:479) 0x7fa316ce56d7 mbcsv_process_snd_ckpt_request (osaf/libs/core/mbcsv/mbcsv_api.c:862) 0x40bfc0 avsv_send_ckpt_data(cl_cb_tag*, unsigned int, unsigned long, unsigned int, unsigned int) (osaf/services/saf/amf/amfd/chkop.cc:1062) 0x446649 avd_node_oper_state_set(AVD_AVND*, SaAmfOperationalStateT) (osaf/services/saf/amf/amfd/node.cc:505) 0x44040c avd_node_mark_absent(AVD_AVND*) (osaf/services/saf/amf/amfd/ndfsm.cc:1018) 0x4438ba avd_node_failover(AVD_AVND*) (osaf/services/saf/amf/amfd/ndproc.cc:1141) ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2296 imm: IMMND on payload crashes after SC absence
--- ** [tickets:#2296] imm: IMMND on payload crashes after SC absence** **Status:** accepted **Milestone:** 5.0.2 **Created:** Thu Feb 09, 2017 08:44 AM UTC by Hung Nguyen **Last Updated:** Thu Feb 09, 2017 08:44 AM UTC **Owner:** Hung Nguyen **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2296/attachment/logs.tgz) (5.2 MB; application/x-compressed) Removal of IMMND coordinator was introduced in [#1692]. Some cleanup actions are delayed until **immnd_proc_server()** is executed. In case the cluster is back from headless too fast, **immnd_proc_server()** will not be executed and IMMND will crashes later. ~~~ 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER 2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting 2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1' 2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state - fevs replies pending:16 fevs highest processed:13154 2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD service is DOWN 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin owner with id:41 when imm is not writable state 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f sv_id:27 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 2050f> (OpenSafImmPBE) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f sv_id:26 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 2010f(down)> (safLogService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 2010f(down)> (@safLogService_appl) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 2010f(down)> (@OpenSafImmReplicatorA) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 2010f(down)> (safClmService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 2010f(down)> (safAmfService) 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f 2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ... 2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with MDS 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH MDS 2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 2050f01e8 already exist 2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for cli_hdl:1464583980303 2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1' 2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed 2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message type 17 - ignoring 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... ScAbsenseAllowed?:900 introduced?:2 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 does not exist 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 (OpenSafImmPBE) <0, 2040f> 2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13157 highestReceived:13158 2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot accept start of sync, will terminate ~~~ IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and crashed. ~~~ #0 0x7f23733bdc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = 0 pid = 406 selftid = 406 #1 0x7f23733c1028 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x152d0009, sa_sigaction = 0x152d0009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0} sigs = {__val = {32, 0 }} #2 0x555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, isJoining=false) at src/imm/immnd/ImmModel.cc:2637
[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out
Hi Anders, Do you see it frequently ? OR Do you think this situation comes under some specific test conditions or a specific platform ? --- ** [tickets:#2278] mds: Blocking send causes AMF health check time-out** **Status:** assigned **Milestone:** 5.1.1 **Created:** Thu Jan 26, 2017 09:49 AM UTC by Anders Widell **Last Updated:** Thu Feb 02, 2017 03:33 AM UTC **Owner:** A V Mahesh (AVM) AMF health-check time-out is seen on SC-1 after restarting SC-2. The system is using OpenSAF 5.1.0 configured with TCP communication. Syslog: ~~~ 2017-01-20T18:29:04.405982+01:00 local0.err SC-1 osafamfnd[2820]: ER AMF director heart beat timeout, generating core for amfd 2017-01-20T18:29:05.408819+01:00 local0.crit SC-1 osafamfnd[2820]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, SupervisionTime = 0 ~~~ Back-trace of osafamfd: ~~~ 0x7fa316cceb60 osaf_poll_no_timeout (osaf/libs/core/common/osaf_poll.c:33) 0x7fa316ccede5 osaf_poll (osaf/libs/core/common/osaf_poll.c:45) 0x7fa316ccee25 osaf_poll_one_fd (osaf/libs/core/common/osaf_poll.c:129) 0x7fa316cfab67 mds_mcm_time_wait (osaf/libs/core/common/include/osaf_utility.h:79) 0x7fa316cfae51 mds_subtn_tbl_add_disc_queue (osaf/libs/core/mds/mds_c_sndrcv.c:1808) 0x7fa316cfb03d mds_mcm_process_disc_queue_checks_redundant (osaf/libs/core/mds/mds_c_sndrcv.c:2338) 0x7fa316cfbcd1 mcm_pvt_red_snd_process_common (osaf/libs/core/mds/mds_c_sndrcv.c:2257) 0x7fa316cfd04d mcm_pvt_red_svc_snd (osaf/libs/core/mds/mds_c_sndrcv.c:2174) 0x7fa316cff8f9 mds_send (osaf/libs/core/mds/mds_c_sndrcv.c:736) 0x7fa316cf9068 ncsmds_api (osaf/libs/core/mds/mds_papi.c:191) 0x7fa316ce6f5f mbcsv_mds_send_msg (osaf/libs/core/mbcsv/mbcsv_mds.c:239) 0x7fa316cec440 mbcsv_send_ckpt_data_to_all_peers (osaf/libs/core/mbcsv/mbcsv_util.c:479) 0x7fa316ce56d7 mbcsv_process_snd_ckpt_request (osaf/libs/core/mbcsv/mbcsv_api.c:862) 0x40bfc0 avsv_send_ckpt_data(cl_cb_tag*, unsigned int, unsigned long, unsigned int, unsigned int) (osaf/services/saf/amf/amfd/chkop.cc:1062) 0x446649 avd_node_oper_state_set(AVD_AVND*, SaAmfOperationalStateT) (osaf/services/saf/amf/amfd/node.cc:505) 0x44040c avd_node_mark_absent(AVD_AVND*) (osaf/services/saf/amf/amfd/ndfsm.cc:1018) 0x4438ba avd_node_failover(AVD_AVND*) (osaf/services/saf/amf/amfd/ndproc.cc:1141) ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets