- **assigned_to**: A V Mahesh (AVM) --> nobody
- **Blocker**: --> False
---
** [tickets:#722] payloads did not go for reboot when both the controllers
rebooted**
**Status:** assigned
**Milestone:** future
**Created:** Thu Jan 16, 2014 07:36 AM UTC by Sirisha Alla
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** nobody
**Attachments:**
-
[payloadnoreboot.tar.bz2](https://sourceforge.net/p/opensaf/tickets/722/attachment/payloadnoreboot.tar.bz2)
(765.1 kB; application/x-bzip)
The issue is seen on changeset 4733 + patches of CLM corresponding to
changesets of #220. Continuous failovers are happening when some api
invocations of IMM application are ongoing. The IMMD has asserted on the new
active which is reported in the ticket #721
When both controllers got rebooted, the payloads did not get rebooted. Instead
the opensaf services are up and running. CLM shows that both the payloads are
not part of cluster. When the payloads are restarted manually, they joined the
cluster.
PL-3 syslog:
Jan 15 18:23:09 SLES-64BIT-SLOT3 osafimmnd[3550]: NO implementer for class
'testMA_verifyObjApplNoResponseModCallback_101' is released => class extent is
UNSAFE
Jan 15 18:23:59 SLES-64BIT-SLOT3 logger: Invoking failover from
invoke_failover.sh
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS
message:92993
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS
message:92994
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Director Service in
NOACTIVE state - fevs replies pending:1 fevs highest processed:92994
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: NO No IMMD service => cluster
restart
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafamfnd[3572]: NO
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: Started
Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: NO Persistent Back-End
capability configured, Pbe file:imm.db (suffix may get added)
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176901] TIPC: Resetting link
<1.1.3:eth0-1.1.2:eth0>, peer not responding
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176911] TIPC: Lost link
<1.1.3:eth0-1.1.2:eth0> on network plane A
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176918] TIPC: Lost contact with
<1.1.2>
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256091] TIPC: Resetting link
<1.1.3:eth0-1.1.1:eth0>, peer not responding
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256100] TIPC: Lost link
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256106] TIPC: Lost contact with
<1.1.1>
Jan 15 18:24:25 SLES-64BIT-SLOT3 kernel: [ 6361.425537] TIPC: Established link
<1.1.3:eth0-1.1.2:eth0> on network plane A
Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE:
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_CLIENT
Jan 15 18:24:29 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin
owner 1 does not exist
Jan 15 18:24:36 SLES-64BIT-SLOT3 kernel: [ 6372.473240] TIPC: Established link
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin
owner 2 does not exist
Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO NODE STATE->
IMM_NODE_LOADING
Jan 15 18:24:45 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM
is:5000
Jan 15 18:24:46 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM
is:6000
Jan 15 18:24:47 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM
is:7000
Jan 15 18:24:48 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM
is:8000
Jan 15 18:24:49 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM
is:9000
After both the controllers came up following is the status:
SLES-64BIT-SLOT1:~ # immlist safNode=PL-3,safCluster=myClmCluster
Name Type Value(s)
========================================================================
safNode SA_STRING_T safNode=PL-3
saClmNodeLockCallbackTimeout SA_TIME_T 50000000000
(0xba43b7400, Thu Jan 1 05:30:50 1970)
saClmNodeIsMember SA_UINT32_T <Empty>
saClmNodeInitialViewNumber SA_UINT64_T <Empty>
saClmNodeID SA_UINT32_T <Empty>
saClmNodeEE SA_NAME_T <Empty>
saClmNodeDisableReboot SA_UINT32_T 0 (0x0)
saClmNodeCurrAddressFamily SA_UINT32_T <Empty>
saClmNodeCurrAddress SA_STRING_T <Empty>
saClmNodeBootTimeStamp SA_TIME_T <Empty>
saClmNodeAdminState SA_UINT32_T 1 (0x1)
saClmNodeAddressFamily SA_UINT32_T <Empty>
saClmNodeAddress SA_STRING_T <Empty>
SaImmAttrImplementerName SA_STRING_T safClmService
SaImmAttrClassName SA_STRING_T SaClmNode
SaImmAttrAdminOwnerName SA_STRING_T IMMLOADER
SLES-64BIT-SLOT1:~ # immlist safAmfNode=PL-3,safAmfCluster=myAmfCluster
Name Type Value(s)
========================================================================
safAmfNode SA_STRING_T safAmfNode=PL-3
saAmfNodeSuFailoverMax SA_UINT32_T 2 (0x2)
saAmfNodeSuFailOverProb SA_TIME_T 1200000000000
(0x1176592e000, Thu Jan 1 05:50:00 1970)
saAmfNodeOperState SA_UINT32_T 2 (0x2)
saAmfNodeFailfastOnTerminationFailure SA_UINT32_T 0 (0x0)
saAmfNodeFailfastOnInstantiationFailure SA_UINT32_T 0 (0x0)
saAmfNodeClmNode SA_NAME_T
safNode=PL-3,safCluster=myClmCluster (36)
saAmfNodeCapacity SA_STRING_T <Empty>
saAmfNodeAutoRepair SA_UINT32_T 1 (0x1)
saAmfNodeAdminState SA_UINT32_T 1 (0x1)
SaImmAttrImplementerName SA_STRING_T safAmfService
SaImmAttrClassName SA_STRING_T SaAmfNode
SaImmAttrAdminOwnerName SA_STRING_T IMMLOADER
SLES-64BIT-SLOT3:/opt/goahead/tetware/opensaffire # /etc/init.d/opensafd status
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
SLES-64BIT-SLOT3:/opt/goahead/tetware/opensaffire # cat /etc/opensaf/node_name
PL-3
SLES-64BIT-SLOT3:/opt/goahead/tetware/opensaffire # ps -ef | grep saf
root 3538 1 0 17:16 ? 00:00:00 /bin/sh
/usr/lib64/opensaf/clc-cli/osaf-transport-monitor
root 3563 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafclmna
--tracemask=0xffffffff
root 3572 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafamfnd
--tracemask=0xffffffff
root 3582 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafsmfnd
root 3591 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafmsgnd
root 3608 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osaflcknd
root 3617 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafckptnd
root 3626 1 0 17:16 ? 00:00:00 /usr/lib64/opensaf/osafamfwd
root 6827 1 1 18:24 ? 00:00:13 /usr/lib64/opensaf/osafimmnd
--tracemask=0xffffffff
root 7490 3073 0 18:42 pts/0 00:00:00 grep saf
Same is with PL-4. Attached the AMF traces.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets