[tickets] [opensaf:tickets] #2361 AMFD: amfd crashed with healthCheckcallbackTimeout causing both controllers to reboot

2017-03-29 Thread Nagendra Kumar
- **status**: review --> fixed
- **Comment**:

changeset:   8736:c3c90b5fb832
branch:  opensaf-5.0.x
parent:  8732:ea44141c05ee
user:Nagendra Kumar
date:Thu Mar 30 10:17:41 2017 +0530
summary: amfd: handle BAD_HANDLE return during config read [#2361]

changeset:   8737:f9a5a957c16a
branch:  opensaf-5.1.x
parent:  8733:be2fd9824bc4
user:Nagendra Kumar
date:Thu Mar 30 10:18:05 2017 +0530
summary: amfd: handle BAD_HANDLE return during config read [#2361]

changeset:   8738:a10d52313ef5
tag: tip
parent:  8735:68a5e668f807
user:Nagendra Kumar
date:Thu Mar 30 10:18:25 2017 +0530
summary: amfd: handle BAD_HANDLE return during config read [#2361]

[staging:c3c90b]
[staging:f9a5a9]
[staging:a10d52]




---

** [tickets:#2361] AMFD: amfd crashed with healthCheckcallbackTimeout causing 
both controllers to reboot**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Fri Mar 10, 2017 09:08 AM UTC by Chani Srivastava
**Last Updated:** Tue Mar 14, 2017 10:42 AM UTC
**Owner:** Nagendra Kumar


**Environment details**

OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )

**Step**

1. Bringu opensaf on four nodes and create a load of 1 lakh objects
2. Imm test cases running on standby controller


SC-1 syslog

Mar  7 19:45:58 OSAF-SC1 osafamfnd[4720]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Mar  7 19:45:58 OSAF-SC1 osafamfnd[4720]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Mar  7 19:45:58 OSAF-SC1 osafamfnd[4720]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover
Mar  7 19:45:58 OSAF-SC1 osafamfnd[4720]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60**
Mar  7 19:45:58 OSAF-SC1 opensaf_reboot: Rebooting local node; timeout=60


SC-2 syslog

Mar  7 19:41:00 OSAF-SC2 osafamfd[4339]: ER Failed to read configuration, AMF 
will not start
Mar  7 19:41:00 OSAF-SC2 osafamfd[4339]: ER avd_imm_config_get FAILED
**Mar  7 19:41:00 OSAF-SC2 osafamfnd[4349]: ER AMFD has unexpectedly crashed. 
Rebooting node**
Mar  7 19:41:00 OSAF-SC2 osafamfnd[4349]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131599, SupervisionTime = 60
Mar  7 19:41:00 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60


amfd, immnd and immd traces are shared seperately as those are huge in size



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch

2017-03-29 Thread Praveen
- **Milestone**: 5.2.RC2 --> next



---

** [tickets:#2100]  Standby should not be rebooted, for  SC absence 
configuration mismatch**

**Status:** unassigned
**Milestone:** next
**Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 05:33 AM UTC
**Owner:** nobody


Changeset : 8190 5.1.GA

-> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in 
immd.conf.

-> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is 
started on SC-2, for which node rebooted.

Oct  7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the 
same as on active IMMD. Active: 900, Standby: 0. Exiting.
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60

   Here  user had misconfigured the configuration on both the controllers, for 
which standby rebooted. Opensafd is enabled in runlevel as part of installation 
and standby shall reboot continuously until opensafd is stopped on SC-1.
   
  Suggested behavior :
   
   Opensafd should not start on standby, instead of immediate reboot. 
   
   Also, the cluster level  attributes like IMMSV_SC_ABSENCE_ALLOWED,  can be 
moved to imm.xml. Node level attributes like traces enabling can be retained in 
configuration files.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2402 base: "hardening" use of lockfile in opensafd

2017-03-29 Thread Hans Nordebäck



---

** [tickets:#2402] base: "hardening" use of lockfile in opensafd**

**Status:** review
**Milestone:** 5.2.RC2
**Created:** Wed Mar 29, 2017 10:40 AM UTC by Hans Nordebäck
**Last Updated:** Wed Mar 29, 2017 10:40 AM UTC
**Owner:** Hans Nordebäck


"hardening" use of lockfile in opensafd


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2289 opensafd (nid): coredump while standby starting

2017-03-29 Thread Anders Widell
- **status**: unassigned --> duplicate
- **Comment**:

Duplicate of [#2294]



---

** [tickets:#2289] opensafd (nid): coredump while standby starting**

**Status:** duplicate
**Milestone:** 5.2.RC2
**Created:** Tue Feb 07, 2017 06:31 AM UTC by A V Mahesh (AVM)
**Last Updated:** Wed Mar 01, 2017 08:21 AM UTC
**Owner:** nobody


Restart Standby with TCP , opensafd core dumping


(gdb) bt
/#0  0x7f2f05cb0b55 in raise () from /lib64/libc.so.6
/#1  0x7f2f05cb2131 in abort () from /lib64/libc.so.6
/#2  0x7f2f06704955 in __gnu_cxx::__verbose_terminate_handler() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/vterminate.cc:95
/#3  0x7f2f06702af6 in __cxxabiv1::__terminate(void (*)()) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:38
/#4  0x7f2f06702b23 in std::terminate() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:48
/#5  0x7f2f06702d42 in __cxa_throw () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_throw.cc:87
/#6  0x7f2f0670322d in operator new(unsigned long) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/new_op.cc:56
/#7  0x7f2f06761979 in std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104
#8  0x7f2f0676256b in std::string::_Rep::_M_clone(std::allocator 
const&, unsigned long) () at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:629
#9  0x7f2f06762bec in std::basic_string::basic_string(std::string const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:229
#10 0x7f2f07262c39 in handle_data_request(pollfd*, std::string const&) () 
at /usr/include/c++/4.8.3/bits/basic_string.h:2405
#11 0x7f2f0726320f in svc_monitor_thread(void*) () at 
src/nid/nodeinit.cc:1539
#12 0x7f2f05ff97b6 in start_thread () from /lib64/libpthread.so.0
#13 0x7f2f05d559cd in clone () from /lib64/libc.so.6
#14 0x in ?? ()
(gdb) q



Feb  7 11:41:13 SC-2 opensafd: OpenSAF services successfully stopped
Feb  7 11:41:21 SC-2 opensafd: Starting OpenSAF Services(5.1.M0 - ) (Using TCP)
Feb  7 11:41:21 SC-2 osafdtmd[5329]: mkfifo already exists: 
/var/lib/opensaf/osafdtmd.fifo File exists
Feb  7 11:41:21 SC-2 osafdtmd[5329]: Started
Feb  7 11:41:21 SC-2 osaftransportd[5336]: Started
Feb  7 11:41:21 SC-2 osafclmna[5343]: Started
Feb  7 11:41:21 SC-2 osafrded[5352]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: NO Remote fencing is disabled
Feb  7 11:41:22 SC-2 osafimmd[5371]: Started
Feb  7 11:41:22 SC-2 osafimmd[5371]: NO *** SC_ABSENCE_ALLOWED (Headless 
Hydra) is configured: 900 ***
Feb  7 11:41:22 SC-2 osafimmnd[5382]: Started
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Feb  7 11:41:22 SC-2 opensafd[5318]: NO Monitoring of TRANSPORT started
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO Starting to promote this node to a 
system controller
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Requesting ACTIVE role
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to Undefined
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-3'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'SC-1'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-4'
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Peer up on node 0x2010f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info request from node 0x2010f 
with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info response from node 
0x2010f with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to QUIESCED
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Giving up election against 0x2010f with 
role ACTIVE. My role is now QUIESCED
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO safNode=SC-2,safCluster=myClmCluster 
Joined cluster, nodeid=2020f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Fevs count adjusted to 2835 
preLoadPid: 0
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_ISOLATED
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING 
--> IMM_SERVER_SYNC_CLIENT
Feb  7 11:41:23 SC-2 osafimmnd[5382]: 

[tickets] [opensaf:tickets] #2401 imm: Check for response when using MDS SNDRSP

2017-03-29 Thread Hung Nguyen



---

** [tickets:#2401] imm: Check for response when using MDS SNDRSP**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Wed Mar 29, 2017 09:02 AM UTC by Hung Nguyen
**Last Updated:** Wed Mar 29, 2017 09:02 AM UTC
**Owner:** Hung Nguyen


Sometimes, ncsmds_api() returned NCSCC_RC_SUCCESS even when 
NCSMDS_INFO.info.svc_send.info.sndrsp.o_rsp is NULL.

The library may crash when that happens

~~~
[New LWP 478]
[New LWP 480]
[New LWP 481]
[New LWP 482]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106

Thread 1 (Thread 0x7f00cb1b5780 (LWP 478)):
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
No locals.
#1  0x7f00ca2e8ef1 in osaf_extended_name_lend (value=0x0, 
name=0x7ffc65188f50) at src/base/osaf_extended_name.c:82
length = 
#2  0x7f00c909a166 in saImmOmSearchNext_2 
(searchHandle=searchHandle@entry=1490679334504883525, 
objectName=objectName@entry=0x7ffc65188f50, 
attributes=attributes@entry=0x7ffc65188ea0) at src/imm/agent/imma_om_api.cc:7580
objName = 0x0
rc = 
#3  0x7f00cab8a7dc in immutil_saImmOmSearchNext_2 
(searchHandle=1490679334504883525, objectName=0x7ffc65188f50, 
attributes=0x7ffc65188ea0) at src/osaf/immutil/immutil.c:1817
rc = 
nTries = 
#4  0x5619eccab268 in avd_su_config_get 
(sg_name="safSg=AmfDemo,safApp=AmfDemo2", sg=sg@entry=0x5619ed8e5b40) at 
src/amf/amfd/su.cc:704
searchHandle = 1490679334504883525
su_name = "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo2"
className = 0x5619eccc1a33 "SaAmfSU"
su = 
configAttributes = {0x5619ecccebde "saAmfSUType", 0x5619eccced2c 
"saAmfSURank", 0x5619eccc1913 "saAmfSUHostedByNode", 0x5619ecccebfd 
"saAmfSUHostNodeOrNodeGroup", 0x5619ecccec29 "saAmfSUFailover", 0x5619eccced11 
"saAmfSUMaintenanceCampaign", 0x5619eccbb477 "saAmfSUAdminState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc65188ea8}}
__FUNCTION__ = "avd_su_config_get"
error = SA_AIS_OK
rc = 
tmp_su_name = {_opaque = {0 }}
attributes = 0x5619ed8e5c70
#5  0x5619ecc61711 in avd_sg_config_get (app_dn="safApp=AmfDemo2", 
app=app@entry=0x5619ed8abc40) at src/amf/amfd/sg.cc:470
searchHandle = 1490679334503167364
dn = {_opaque = {29, 24947, 21350, 15719, 27969, 17510, 28005, 11375, 
24947, 16742, 28784, 16701, 26221, 25924, 28525, 50, 0 }}
className = 0x5619eccc1a23 "SaAmfSG"
configAttributes = {0x5619eccc84e6 "saAmfSGType", 0x5619eccc8516 
"saAmfSGSuHostNodeGroup", 0x5619eccc84f2 "saAmfSGAutoRepair", 0x5619eccc8504 
"saAmfSGAutoAdjust", 0x5619eccc857c "saAmfSGNumPrefActiveSUs", 0x5619eccc8594 
"saAmfSGNumPrefStandbySUs", 0x5619eccc85ad "saAmfSGNumPrefInserviceSUs", 
0x5619eccc85c8 "saAmfSGNumPrefAssignedSUs", 0x5619eccc85e2 
"saAmfSGMaxActiveSIsperSU", 0x5619eccc85fb "saAmfSGMaxStandbySIsperSU", 
0x5619eccc8615 "saAmfSGAutoAdjustProb", 0x5619eccc862b 
"saAmfSGCompRestartProb", 0x5619eccc8642 "saAmfSGCompRestartMax", 
0x5619eccc8658 "saAmfSGSuRestartProb", 0x5619eccc866d "saAmfSGSuRestartMax", 
0x5619eccc8313 "saAmfSGAdminState", 0x5619eccc833e "osafAmfSGFsmState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
sg = 0x5619ed8e5b40
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc65189108}}
__FUNCTION__ = "avd_sg_config_get"
error = SA_AIS_OK
rc = 
attributes = 0x5619ed8e4370
#6  0x5619ecbf8981 in avd_app_config_get () at src/amf/amfd/app.cc:460
searchHandle = 1490679334315192083
dn = {_opaque = {15, 24947, 16742, 28784, 16701, 26221, 25924, 28525, 
50, 0 }}
className = 0x5619eccb9938 "SaAmfApplication"
configAttributes = {0x5619eccb987f "saAmfAppType", 0x5619eccb98cd 
"saAmfApplicationAdminState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc651893b8}}
app = 0x5619ed8abc40
__FUNCTION__ = "avd_app_config_get"
error = SA_AIS_ERR_FAILED_OPERATION
rc = 
attributes = 0x5619ed89cab0
#7  0x5619ecc332d5 in avd_imm_config_get () at src/amf/amfd/imm.cc:1631
rc = 2
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "avd_imm_config_get"
#8  0x5619ecc56b85 in avd_standby_role_initialization 
(cb=cb@entry=0x5619ecef1e60 <_control_block>) at 

[tickets] [opensaf:tickets] #2400 AMFD: Cached node_up message causes amfnd reboot after node joins cluster

2017-03-29 Thread Gary Lee
- **status**: unassigned --> accepted
- **assigned_to**: Gary Lee



---

** [tickets:#2400] AMFD: Cached node_up message causes amfnd reboot after node 
joins cluster**

**Status:** accepted
**Milestone:** 5.1.1
**Created:** Wed Mar 29, 2017 06:05 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 29, 2017 06:05 AM UTC
**Owner:** Gary Lee


SC Absence is enabled, restarts both SCs. After all amfnd introduce node_up and 
join cluster, cluster startup timer expires in which amfd will start 
application assignments. At this time, a retransmitted node_up message which 
could be cached in mailbox (or late coming) that makes amfd to order a node 
reboot

ar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, 
msg_type:31, from node:2040f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, 
msg_type:31, from node:2030f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, 
msg_type:32, from node:2040f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, 
msg_type:32, from node:2030f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up_msg from all nodes
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1

Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Enter restore headless cached RTAs from 
IMM
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Leave reading headless cached RTAs from 
IMM: SUCCESS
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Node 'SC-2' joined the cluster

Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'PL-3' joined the cluster
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2010f: msg_id 1
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'SC-1' joined the cluster

Mar 20 15:05:00 SC-2 osafamfd[9576]: NO Cluster startup is done

Mar 20 15:05:18 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1
Mar 20 15:05:18 SC-2 osafamfd[9576]: WA Sending node reboot order to 
node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after 
cluster startup timeout



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2400 AMFD: Cached node_up message causes amfnd reboot after node joins cluster

2017-03-29 Thread Minh Hon Chau



---

** [tickets:#2400] AMFD: Cached node_up message causes amfnd reboot after node 
joins cluster**

**Status:** unassigned
**Milestone:** 5.1.1
**Created:** Wed Mar 29, 2017 06:05 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 29, 2017 06:05 AM UTC
**Owner:** nobody


SC Absence is enabled, restarts both SCs. After all amfnd introduce node_up and 
join cluster, cluster startup timer expires in which amfd will start 
application assignments. At this time, a retransmitted node_up message which 
could be cached in mailbox (or late coming) that makes amfd to order a node 
reboot

ar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, 
msg_type:31, from node:2040f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:12, 
msg_type:31, from node:2030f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, 
msg_type:32, from node:2040f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Receive message with event type:13, 
msg_type:32, from node:2030f, msg_id:0
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up_msg from all nodes
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1

Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Enter restore headless cached RTAs from 
IMM
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Leave reading headless cached RTAs from 
IMM: SUCCESS
Mar 20 15:04:46 SC-2 osafamfd[9576]: NO Node 'SC-2' joined the cluster

Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'PL-3' joined the cluster
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Received node_up from 2010f: msg_id 1
Mar 20 15:04:49 SC-2 osafamfd[9576]: NO Node 'SC-1' joined the cluster

Mar 20 15:05:00 SC-2 osafamfd[9576]: NO Cluster startup is done

Mar 20 15:05:18 SC-2 osafamfd[9576]: NO Received node_up from 2030f: msg_id 1
Mar 20 15:05:18 SC-2 osafamfd[9576]: WA Sending node reboot order to 
node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after 
cluster startup timeout



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets