- **assigned_to**: Nagendra Kumar


---

** [tickets:#498] OpenSAF startup issue on a clm locked node**

**Status:** unassigned
**Created:** Wed Jul 10, 2013 09:28 AM UTC by Sirisha Alla
**Last Updated:** Wed Jul 10, 2013 09:28 AM UTC
**Owner:** Nagendra Kumar

The issue is seen using changset 4325 on SLES 4 node VMs. IMM PBE is enabled 
and loaded with 25k objects.

Lock is done on CLM node safNode=PL-3,safCluster=myClmCluster. Opensafd is 
restarted/PL-3 is rebooted after clm node lock. The following is observed

1) OpenSAF fails to come after reboot/restart of opensaf.

Jul  9 17:23:04 SLES-64BIT-SLOT3 kernel: [ 1402.648451] TIPC: Established link 
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jul  9 17:23:04 SLES-64BIT-SLOT3 kernel: [ 1402.648481] TIPC: Established link 
<1.1.3:eth0-1.1.4:eth1> on network plane A
Jul  9 17:23:04 SLES-64BIT-SLOT3 kernel: [ 1402.648511] TIPC: Established link 
<1.1.3:eth0-1.1.2:eth1> on network plane A
Jul  9 17:23:04 SLES-64BIT-SLOT3 osafimmnd[3331]: Started
Jul  9 17:23:04 SLES-64BIT-SLOT3 osafimmnd[3331]: NO Persistent Back-End 
capability configured, Pbe file:imm.db
.....
Jul  9 17:23:24 SLES-64BIT-SLOT3 osafclmna[3344]: Started
Jul  9 17:23:25 SLES-64BIT-SLOT3 osafclmna[3344]: NO 
safNode=PL-3,safCluster=myClmCluster Joined cluster, nodeid=2030f
Jul  9 17:23:25 SLES-64BIT-SLOT3 osafamfnd[3353]: Started
Jul  9 17:23:25 SLES-64BIT-SLOT3 osafimmnd[3331]: NO Implementer connected: 44 
(MsgQueueService132111) <0, 2040f>
Jul  9 17:39:55 SLES-64BIT-SLOT3 opensafd[3305]: ER Timed-out for response from 
AMFND
Jul  9 17:39:55 SLES-64BIT-SLOT3 opensafd[3305]: ER
Jul  9 17:39:55 SLES-64BIT-SLOT3 opensafd[3305]: ER Going for recovery
Jul  9 17:39:55 SLES-64BIT-SLOT3 osafamfnd[3353]: NO Shutdown initiated
Jul  9 17:39:55 SLES-64BIT-SLOT3 osafamfnd[3353]: NO Terminating all AMF 
components
Jul  9 17:39:55 SLES-64BIT-SLOT3 osafamfnd[3353]: NO No component to terminate, 
exiting
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387709] TIPC: Disabling bearer 
<eth:eth0>
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387715] TIPC: Lost link 
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387718] TIPC: Lost contact with 
<1.1.1>
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387735] TIPC: Lost link 
<1.1.3:eth0-1.1.4:eth1> on network plane A
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387737] TIPC: Lost contact with 
<1.1.4>
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387742] TIPC: Lost link 
<1.1.3:eth0-1.1.2:eth1> on network plane A
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387743] TIPC: Lost contact with 
<1.1.2>
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387799] TIPC: Left network mode
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387837] NET: Unregistered 
protocol family 30
Jul  9 17:39:55 SLES-64BIT-SLOT3 kernel: [ 2413.387842] TIPC: Deactivated
Jul  9 17:39:55 SLES-64BIT-SLOT3 opensafd: Starting OpenSAF failed

Here AMFND has timed out while bringing up. Following is observed in amfnd 
traces of PL-3:

Jul  9 17:23:25.052461 osafamfnd [3353:avnd_clm.c:0208] TR Node has left the 
cluster 'safNode=PL-3,safCluster=myClmCluster', avnd_cb->first_time_up 
1,notifItem->clusterNode.nodeId 131855, avnd_cb->node_info.nodeId 131855
Jul  9 17:23:25.052469 osafamfnd [3353:avnd_clm.c:0251] << clm_track_cb
Jul  9 17:23:25.052475 osafamfnd [3353:clma_util.c:0468] << clma_hdl_cbk_rec_prc
Jul  9 17:23:25.052480 osafamfnd [3353:clma_util.c:0653] >> clma_msg_destroy
Jul  9 17:23:25.052486 osafamfnd [3353:clma_util.c:0677] << clma_msg_destroy
Jul  9 17:23:25.052495 osafamfnd [3353:clma_util.c:0543] << 
clma_hdl_cbk_dispatch_all
Jul  9 17:23:25.052502 osafamfnd [3353:clma_util.c:0622] << 
clma_hdl_cbk_dispatch
Jul  9 17:23:25.052507 osafamfnd [3353:clma_api.c:0774] << saClmDispatch
Jul  9 17:39:55.144868 osafamfnd [3353:avnd_proc.c:0257] >> avnd_evt_process
Jul  9 17:39:55.144928 osafamfnd [3353:avnd_proc.c:0272] TR Evt type:51
Jul  9 17:39:55.144949 osafamfnd [3353:avnd_term.c:0108] >> 
avnd_evt_last_step_term_evh
Jul  9 17:39:55.145000 osafamfnd [3353:avnd_term.c:0112] NO Shutdown initiated
Jul  9 17:39:55.145016 osafamfnd [3353:avnd_term.c:0063] >> avnd_last_step_clean
Jul  9 17:39:55.145037 osafamfnd [3353:avnd_term.c:0065] NO Terminating all AMF 
components
Jul  9 17:39:55.145061 osafamfnd [3353:avnd_term.c:0088] NO No component to 
terminate, exiting

2) opensafd does not indicate the success or failure for opensaf startup 
(missing logging ??)

Jul 10 12:46:30 SLES-64BIT-SLOT3 kernel: [71208.445001] TIPC: Established link 
<1.1.3:eth0-1.1.4:eth1> on network plane A
Jul 10 12:46:31 SLES-64BIT-SLOT3 kernel: [71209.204837] TIPC: Established link 
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jul 10 12:46:31 SLES-64BIT-SLOT3 kernel: [71209.204889] TIPC: Established link 
<1.1.3:eth0-1.1.2:eth1> on network plane A
Jul 10 12:46:31 SLES-64BIT-SLOT3 osafimmnd[17486]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jul 10 12:46:31 SLES-64BIT-SLOT3 osafimmnd[17486]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jul 10 12:46:31 SLES-64BIT-SLOT3 osafimmnd[17486]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jul 10 12:46:31 SLES-64BIT-SLOT3 osafimmnd[17486]: NO NODE STATE-> 
IMM_NODE_ISOLATED
Jul 10 12:46:32 SLES-64BIT-SLOT3 osafimmnd[17486]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
Jul 10 12:46:32 SLES-64BIT-SLOT3 osafimmnd[17486]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafimmnd[17486]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 2144
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafimmnd[17486]: NO RepositoryInitModeT is 
SA_IMM_KEEP_REPOSITORY
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafimmnd[17486]: NO Epoch set to 84 in 
ImmModel
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafimmnd[17486]: NO SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafclmna[17499]: Started
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafclmna[17499]: NO 
safNode=PL-3,safCluster=myClmCluster Joined cluster, nodeid=2030f
Jul 10 12:46:45 SLES-64BIT-SLOT3 osafamfnd[17508]: Started
Jul 10 12:46:49 SLES-64BIT-SLOT3 osafimmnd[17486]: NO Implementer connected: 52 
(MsgQueueService131855) <0, 2010f>
Jul 10 12:46:49 SLES-64BIT-SLOT3 osafimmnd[17486]: NO Implementer disconnected 
52 <0, 2010f> (MsgQueueService131855)

After this message there is no "Opensaf successfully started" or "OpenSAF 
startup failed" message in the syslog.

After sometime(approx 1hr 40 mins) when clm unlock is done, middleware su 
components are brought up successfully.

Jul 10 14:23:13 SLES-64BIT-SLOT3 osafamfnd[17508]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafmsgnd[20002]: Started
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafimmnd[17486]: NO Implementer connected: 53 
(MsgQueueService131855) <43, 2030f>
Jul 10 14:23:13 SLES-64BIT-SLOT3 osaflcknd[20018]: Started
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafsmfnd[20027]: Started
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafckptnd[20036]: Started
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafamfwd[20045]: Started
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafamfnd[17508]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => 
INSTANTIATED
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafamfnd[17508]: NO Assigning 
'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
Jul 10 14:23:13 SLES-64BIT-SLOT3 osafamfnd[17508]: NO Assigned 
'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'


While scenario 1 is seen quite frequently, it is difficult to reproduce 
scenario 2. Have observed scenario 2 once in 20 tries. I think scenario 2 
should be the expected behavior with opensafd indication that "opensaf started 
successully"

Attached syslog and amfnd traces on PL-3.




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to