[tickets] [opensaf:tickets] #2361 AMFD: amfd crashed with healthCheckcallbackTimeout causing both controllers to reboot
- **status**: review --> fixed - **Comment**: changeset: 8736:c3c90b5fb832 branch: opensaf-5.0.x parent: 8732:ea44141c05ee user:Nagendra Kumardate:Thu Mar 30 10:17:41 2017 +0530 summary: amfd: handle BAD_HANDLE return during config read [#2361] changeset: 8737:f9a5a957c16a branch: opensaf-5.1.x parent: 8733:be2fd9824bc4 user:Nagendra Kumar date:Thu Mar 30 10:18:05 2017 +0530 summary: amfd: handle BAD_HANDLE return during config read [#2361] changeset: 8738:a10d52313ef5 tag: tip parent: 8735:68a5e668f807 user:Nagendra Kumar date:Thu Mar 30 10:18:25 2017 +0530 summary: amfd: handle BAD_HANDLE return during config read [#2361] [staging:c3c90b] [staging:f9a5a9] [staging:a10d52] --- ** [tickets:#2361] AMFD: amfd crashed with healthCheckcallbackTimeout causing both controllers to reboot** **Status:** fixed **Milestone:** 5.0.2 **Created:** Fri Mar 10, 2017 09:08 AM UTC by Chani Srivastava **Last Updated:** Tue Mar 14, 2017 10:42 AM UTC **Owner:** Nagendra Kumar **Environment details** OS : Suse 64bit Changeset : 8634 ( 5.2.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled ) **Step** 1. Bringu opensaf on four nodes and create a load of 1 lakh objects 2. Imm test cases running on standby controller SC-1 syslog Mar 7 19:45:58 OSAF-SC1 osafamfnd[4720]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated from 'componentFailover' to 'suFailover' Mar 7 19:45:58 OSAF-SC1 osafamfnd[4720]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' **Mar 7 19:45:58 OSAF-SC1 osafamfnd[4720]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:healthCheckcallbackTimeout Recovery is:suFailover Mar 7 19:45:58 OSAF-SC1 osafamfnd[4720]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60** Mar 7 19:45:58 OSAF-SC1 opensaf_reboot: Rebooting local node; timeout=60 SC-2 syslog Mar 7 19:41:00 OSAF-SC2 osafamfd[4339]: ER Failed to read configuration, AMF will not start Mar 7 19:41:00 OSAF-SC2 osafamfd[4339]: ER avd_imm_config_get FAILED **Mar 7 19:41:00 OSAF-SC2 osafamfnd[4349]: ER AMFD has unexpectedly crashed. Rebooting node** Mar 7 19:41:00 OSAF-SC2 osafamfnd[4349]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 131599, SupervisionTime = 60 Mar 7 19:41:00 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60 amfd, immnd and immd traces are shared seperately as those are huge in size --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch
- **Milestone**: 5.2.RC2 --> next --- ** [tickets:#2100] Standby should not be rebooted, for SC absence configuration mismatch** **Status:** unassigned **Milestone:** next **Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 05:33 AM UTC **Owner:** nobody Changeset : 8190 5.1.GA -> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in immd.conf. -> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is started on SC-2, for which node rebooted. Oct 7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the same as on active IMMD. Active: 900, Standby: 0. Exiting. Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Here user had misconfigured the configuration on both the controllers, for which standby rebooted. Opensafd is enabled in runlevel as part of installation and standby shall reboot continuously until opensafd is stopped on SC-1. Suggested behavior : Opensafd should not start on standby, instead of immediate reboot. Also, the cluster level attributes like IMMSV_SC_ABSENCE_ALLOWED, can be moved to imm.xml. Node level attributes like traces enabling can be retained in configuration files. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2402 base: "hardening" use of lockfile in opensafd
--- ** [tickets:#2402] base: "hardening" use of lockfile in opensafd** **Status:** review **Milestone:** 5.2.RC2 **Created:** Wed Mar 29, 2017 10:40 AM UTC by Hans Nordebäck **Last Updated:** Wed Mar 29, 2017 10:40 AM UTC **Owner:** Hans Nordebäck "hardening" use of lockfile in opensafd --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2289 opensafd (nid): coredump while standby starting
- **status**: unassigned --> duplicate - **Comment**: Duplicate of [#2294] --- ** [tickets:#2289] opensafd (nid): coredump while standby starting** **Status:** duplicate **Milestone:** 5.2.RC2 **Created:** Tue Feb 07, 2017 06:31 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Mar 01, 2017 08:21 AM UTC **Owner:** nobody Restart Standby with TCP , opensafd core dumping (gdb) bt /#0 0x7f2f05cb0b55 in raise () from /lib64/libc.so.6 /#1 0x7f2f05cb2131 in abort () from /lib64/libc.so.6 /#2 0x7f2f06704955 in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/vterminate.cc:95 /#3 0x7f2f06702af6 in __cxxabiv1::__terminate(void (*)()) () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:38 /#4 0x7f2f06702b23 in std::terminate() () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:48 /#5 0x7f2f06702d42 in __cxa_throw () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_throw.cc:87 /#6 0x7f2f0670322d in operator new(unsigned long) () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/new_op.cc:56 /#7 0x7f2f06761979 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104 #8 0x7f2f0676256b in std::string::_Rep::_M_clone(std::allocator const&, unsigned long) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:629 #9 0x7f2f06762bec in std::basic_string::basic_string(std::string const&) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:229 #10 0x7f2f07262c39 in handle_data_request(pollfd*, std::string const&) () at /usr/include/c++/4.8.3/bits/basic_string.h:2405 #11 0x7f2f0726320f in svc_monitor_thread(void*) () at src/nid/nodeinit.cc:1539 #12 0x7f2f05ff97b6 in start_thread () from /lib64/libpthread.so.0 #13 0x7f2f05d559cd in clone () from /lib64/libc.so.6 #14 0x in ?? () (gdb) q Feb 7 11:41:13 SC-2 opensafd: OpenSAF services successfully stopped Feb 7 11:41:21 SC-2 opensafd: Starting OpenSAF Services(5.1.M0 - ) (Using TCP) Feb 7 11:41:21 SC-2 osafdtmd[5329]: mkfifo already exists: /var/lib/opensaf/osafdtmd.fifo File exists Feb 7 11:41:21 SC-2 osafdtmd[5329]: Started Feb 7 11:41:21 SC-2 osaftransportd[5336]: Started Feb 7 11:41:21 SC-2 osafclmna[5343]: Started Feb 7 11:41:21 SC-2 osafrded[5352]: Started Feb 7 11:41:22 SC-2 osaffmd[5361]: Started Feb 7 11:41:22 SC-2 osaffmd[5361]: NO Remote fencing is disabled Feb 7 11:41:22 SC-2 osafimmd[5371]: Started Feb 7 11:41:22 SC-2 osafimmd[5371]: NO *** SC_ABSENCE_ALLOWED (Headless Hydra) is configured: 900 *** Feb 7 11:41:22 SC-2 osafimmnd[5382]: Started Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Feb 7 11:41:22 SC-2 opensafd[5318]: NO Monitoring of TRANSPORT started Feb 7 11:41:22 SC-2 osafclmna[5343]: NO Starting to promote this node to a system controller Feb 7 11:41:22 SC-2 osafrded[5352]: NO Requesting ACTIVE role Feb 7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to Undefined Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-3' Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'SC-1' Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-4' Feb 7 11:41:22 SC-2 osafrded[5352]: NO Peer up on node 0x2010f Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 Feb 7 11:41:22 SC-2 osafrded[5352]: NO Got peer info request from node 0x2010f with role ACTIVE Feb 7 11:41:22 SC-2 osafrded[5352]: NO Got peer info response from node 0x2010f with role ACTIVE Feb 7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to QUIESCED Feb 7 11:41:22 SC-2 osafrded[5352]: NO Giving up election against 0x2010f with role ACTIVE. My role is now QUIESCED Feb 7 11:41:22 SC-2 osafclmna[5343]: NO safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO Fevs count adjusted to 2835 preLoadPid: 0 Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_ISOLATED Feb 7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_W_AVAILABLE Feb 7 11:41:23 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Feb 7 11:41:23 SC-2 osafimmnd[5382]:
[tickets] [opensaf:tickets] #2401 imm: Check for response when using MDS SNDRSP
--- ** [tickets:#2401] imm: Check for response when using MDS SNDRSP** **Status:** accepted **Milestone:** 5.0.2 **Created:** Wed Mar 29, 2017 09:02 AM UTC by Hung Nguyen **Last Updated:** Wed Mar 29, 2017 09:02 AM UTC **Owner:** Hung Nguyen Sometimes, ncsmds_api() returned NCSCC_RC_SUCCESS even when NCSMDS_INFO.info.svc_send.info.sndrsp.o_rsp is NULL. The library may crash when that happens ~~~ [New LWP 478] [New LWP 480] [New LWP 481] [New LWP 482] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/local/lib/opensaf/osafamfd'. Program terminated with signal SIGSEGV, Segmentation fault. #0 strlen () at ../sysdeps/x86_64/strlen.S:106 Thread 1 (Thread 0x7f00cb1b5780 (LWP 478)): #0 strlen () at ../sysdeps/x86_64/strlen.S:106 No locals. #1 0x7f00ca2e8ef1 in osaf_extended_name_lend (value=0x0, name=0x7ffc65188f50) at src/base/osaf_extended_name.c:82 length = #2 0x7f00c909a166 in saImmOmSearchNext_2 (searchHandle=searchHandle@entry=1490679334504883525, objectName=objectName@entry=0x7ffc65188f50, attributes=attributes@entry=0x7ffc65188ea0) at src/imm/agent/imma_om_api.cc:7580 objName = 0x0 rc = #3 0x7f00cab8a7dc in immutil_saImmOmSearchNext_2 (searchHandle=1490679334504883525, objectName=0x7ffc65188f50, attributes=0x7ffc65188ea0) at src/osaf/immutil/immutil.c:1817 rc = nTries = #4 0x5619eccab268 in avd_su_config_get (sg_name="safSg=AmfDemo,safApp=AmfDemo2", sg=sg@entry=0x5619ed8e5b40) at src/amf/amfd/su.cc:704 searchHandle = 1490679334504883525 su_name = "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo2" className = 0x5619eccc1a33 "SaAmfSU" su = configAttributes = {0x5619ecccebde "saAmfSUType", 0x5619eccced2c "saAmfSURank", 0x5619eccc1913 "saAmfSUHostedByNode", 0x5619ecccebfd "saAmfSUHostNodeOrNodeGroup", 0x5619ecccec29 "saAmfSUFailover", 0x5619eccced11 "saAmfSUMaintenanceCampaign", 0x5619eccbb477 "saAmfSUAdminState", 0x0} t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} searchParam = {searchOneAttr = {attrName = 0x5619eccb998c "SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 0x7ffc65188ea8}} __FUNCTION__ = "avd_su_config_get" error = SA_AIS_OK rc = tmp_su_name = {_opaque = {0 }} attributes = 0x5619ed8e5c70 #5 0x5619ecc61711 in avd_sg_config_get (app_dn="safApp=AmfDemo2", app=app@entry=0x5619ed8abc40) at src/amf/amfd/sg.cc:470 searchHandle = 1490679334503167364 dn = {_opaque = {29, 24947, 21350, 15719, 27969, 17510, 28005, 11375, 24947, 16742, 28784, 16701, 26221, 25924, 28525, 50, 0 }} className = 0x5619eccc1a23 "SaAmfSG" configAttributes = {0x5619eccc84e6 "saAmfSGType", 0x5619eccc8516 "saAmfSGSuHostNodeGroup", 0x5619eccc84f2 "saAmfSGAutoRepair", 0x5619eccc8504 "saAmfSGAutoAdjust", 0x5619eccc857c "saAmfSGNumPrefActiveSUs", 0x5619eccc8594 "saAmfSGNumPrefStandbySUs", 0x5619eccc85ad "saAmfSGNumPrefInserviceSUs", 0x5619eccc85c8 "saAmfSGNumPrefAssignedSUs", 0x5619eccc85e2 "saAmfSGMaxActiveSIsperSU", 0x5619eccc85fb "saAmfSGMaxStandbySIsperSU", 0x5619eccc8615 "saAmfSGAutoAdjustProb", 0x5619eccc862b "saAmfSGCompRestartProb", 0x5619eccc8642 "saAmfSGCompRestartMax", 0x5619eccc8658 "saAmfSGSuRestartProb", 0x5619eccc866d "saAmfSGSuRestartMax", 0x5619eccc8313 "saAmfSGAdminState", 0x5619eccc833e "osafAmfSGFsmState", 0x0} t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} sg = 0x5619ed8e5b40 searchParam = {searchOneAttr = {attrName = 0x5619eccb998c "SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 0x7ffc65189108}} __FUNCTION__ = "avd_sg_config_get" error = SA_AIS_OK rc = attributes = 0x5619ed8e4370 #6 0x5619ecbf8981 in avd_app_config_get () at src/amf/amfd/app.cc:460 searchHandle = 1490679334315192083 dn = {_opaque = {15, 24947, 16742, 28784, 16701, 26221, 25924, 28525, 50, 0 }} className = 0x5619eccb9938 "SaAmfApplication" configAttributes = {0x5619eccb987f "saAmfAppType", 0x5619eccb98cd "saAmfApplicationAdminState", 0x0} t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} searchParam = {searchOneAttr = {attrName = 0x5619eccb998c "SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 0x7ffc651893b8}} app = 0x5619ed8abc40 __FUNCTION__ = "avd_app_config_get" error = SA_AIS_ERR_FAILED_OPERATION rc = attributes = 0x5619ed89cab0 #7 0x5619ecc332d5 in avd_imm_config_get () at src/amf/amfd/imm.cc:1631 rc = 2 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} __FUNCTION__ = "avd_imm_config_get" #8 0x5619ecc56b85 in avd_standby_role_initialization (cb=cb@entry=0x5619ecef1e60 <_control_block>) at
[tickets] [opensaf:tickets] #2400 AMFD: Cached node_up message causes amfnd reboot after node joins cluster
- **status**: unassigned --> accepted - **assigned_to**: Gary Lee --- ** [tickets:#2400] AMFD: Cached node_up message causes amfnd reboot after node joins cluster** **Status:** accepted **Milestone:** 5.1.1 **Created:** Wed Mar 29, 2017 06:05 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 29, 2017 06:05 AM UTC **Owner:** Gary Lee SC Absence is enabled, restarts both SCs. After all amfnd introduce node_up and join cluster, cluster startup timer expires in which amfd will start application assignments. At this time, a retransmitted node_up message which could be cached in mailbox (or late coming) that makes amfd to order a node reboot ar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, msg_type:31, from node:2040f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, msg_type:31, from node:2030f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, msg_type:32, from node:2040f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, msg_type:32, from node:2030f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up_msg from all nodes Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Enter restore headless cached RTAs from IMM Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Leave reading headless cached RTAs from IMM: SUCCESS Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Node 'SC-2' joined the cluster Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'PL-3' joined the cluster Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2010f: msg_id 1 Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'SC-1' joined the cluster Mar 20 15:05:00 SC-2 osafamfd[9576]: NO Cluster startup is done Mar 20 15:05:18 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:05:18 SC-2 osafamfd[9576]: WA Sending node reboot order to node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after cluster startup timeout --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2400 AMFD: Cached node_up message causes amfnd reboot after node joins cluster
--- ** [tickets:#2400] AMFD: Cached node_up message causes amfnd reboot after node joins cluster** **Status:** unassigned **Milestone:** 5.1.1 **Created:** Wed Mar 29, 2017 06:05 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 29, 2017 06:05 AM UTC **Owner:** nobody SC Absence is enabled, restarts both SCs. After all amfnd introduce node_up and join cluster, cluster startup timer expires in which amfd will start application assignments. At this time, a retransmitted node_up message which could be cached in mailbox (or late coming) that makes amfd to order a node reboot ar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, msg_type:31, from node:2040f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, msg_type:31, from node:2030f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, msg_type:32, from node:2040f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, msg_type:32, from node:2030f, msg_id:0 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up_msg from all nodes Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Enter restore headless cached RTAs from IMM Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Leave reading headless cached RTAs from IMM: SUCCESS Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Node 'SC-2' joined the cluster Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'PL-3' joined the cluster Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2010f: msg_id 1 Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'SC-1' joined the cluster Mar 20 15:05:00 SC-2 osafamfd[9576]: NO Cluster startup is done Mar 20 15:05:18 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1 Mar 20 15:05:18 SC-2 osafamfd[9576]: WA Sending node reboot order to node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after cluster startup timeout --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets