SC-1 is active

1. page allocation failure on SC-1


Apr  6 07:55:33 OEL_M-SLOT-1 osafimmd[6916]: NO Node 2030f request sync 
sync-pid:3259 epoch:0
Apr  6 07:55:34 OEL_M-SLOT-1 osafimmnd[3728]: NO Announce sync, epoch:81
Apr  6 07:55:34 OEL_M-SLOT-1 osafimmnd[3728]: NO SERVER STATE: IMM_SERVER_READY 
--> IMM_SERVER_SYNC_SERVER
Apr  6 07:55:34 OEL_M-SLOT-1 osafimmnd[3728]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: osafimmnd: page allocation failure. 
order:4, mode:0x20
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: Pid: 3728, comm: osafimmnd Not tainted 
2.6.32.67 #1
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: Call Trace:
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff81106adc>] 
__alloc_pages_nodemask+0x57c/0x6c0
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff8113f3c2>] 
kmem_getpages+0x62/0x170
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff8113ff7a>] 
fallback_alloc+0x19a/0x240
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff8113fa1b>] ? 
cache_grow+0x2bb/0x2f0
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff8113fd25>] 
____cache_alloc_node+0x95/0x150
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff813e0f8a>] ? 
__alloc_skb+0x7a/0x180
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff81140b5f>] 
kmem_cache_alloc_node_notrace+0x6f/0x130
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff81140d9b>] 
__kmalloc_node+0x7b/0x100
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffff813e0f8a>] __alloc_skb+0x7a/0x180
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffffa04d7bdb>] buf_acquire+0x2b/0x60 
[tipc]
Apr  6 07:55:35 OEL_M-SLOT-1 kernel: [<ffffffffa04de077>] 
tipc_msg_build+0x97/0x210 [tipc]

2. sync failed:

Apr  6 07:55:35 OEL_M-SLOT-1 osafmsgd[7099]: ER Sending the message to the 
specified destination with error 6
Apr  6 07:55:35 OEL_M-SLOT-1 osafmsgd[7099]: ER ERR_FAILED_OPERATION: Couldn't 
Send ASAPi Name Resolution Response Message
Apr  6 07:55:35 OEL_M-SLOT-1 osafimmnd[3728]: WA MDS Send Failed to 
service:IMMD rc:2
Apr  6 07:55:35 OEL_M-SLOT-1 osafimmnd[3728]: ER Problem in sending asyncronous 
FEVS message over MDS - dropping message!
Apr  6 07:55:35 OEL_M-SLOT-1 osafimmloadd: ER immsv_finalize_sync failed!
Apr  6 07:55:35 OEL_M-SLOT-1 osafimmloadd: ER Sync ending ABNORMALLY

3. sending the message through TIPC is failed.

Apr  6  7:42:59.850380 mqsv_a[10430] ERR  |MDS_SND_RCV: Timeout or Error occured
Apr  6  7:42:59.850465 mqsv_a[10430] ERR  |MDS_SND_RCV: Timeout occured on 
sndrsp message
Apr  6  7:42:59.850500 mqsv_a[10430] ERR  |MDS_SND_RCV: Adest=<0x00000000,10>

There is no logging in mds


Apr  6  7:43:21.873375 mqsv_a[10430] ERR  |MDS_SND_RCV: Timeout or Error occured
Apr  6  7:43:21.873530 mqsv_a[10430] ERR  |MDS_SND_RCV: Timeout occured on 
sndrsp message
Apr  6  7:43:21.873588 mqsv_a[10430] ERR  |MDS_SND_RCV: Adest=<0x00000000,10>
Apr  6 10:19:08.276857 immlist[13976] NOTIFY  |BEGIN MDS LOGGING| 
PID=<immlist[13976]> | ARCHW=a|64bit=1
Apr  6 10:19:08.283659 immlist[13976] NOTIFY  |MDTM: install_tipc : svc_id = 
IMMA_OM(26), vdest=65535
Apr  6 10:19:08.283815 immlist[13976] NOTIFY  |MDTM: svc up event for svc_id = 
IMMND(25), subscri. by svc_id = IMMA_OM(26) pwe_id=1 Adest = 
<rem_node[3]:dest_tipc_id_ref[3854262298]>


There seems to be problem with system resources, and the system is unable to 
send messages through transport.



---

** [tickets:#1735] imm: IMMND doesn't handle error when sending finalize-sync 
message over fevs**

**Status:** review
**Milestone:** 4.6.2
**Created:** Thu Apr 07, 2016 04:54 AM UTC by Madhurika Koppula
**Last Updated:** Thu Apr 14, 2016 11:46 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[immnd.tgz](https://sourceforge.net/p/opensaf/tickets/1735/attachment/immnd.tgz)
 (41.2 MB; application/octet-stream)


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE

Issue observed: 

1)It is observed after the scenario mentioned in ticket #1733

2) After reboot, Immnd failed to come up stating that NODE STATE-> 
IMM_NODE_ISOLATED and could not respawn properly.

3) PL-4 node didnot join the cluster after the multiple starts until cluster 
reset.


Following is the timestamp of payload PL-4:

Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: Started
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO SETTING COORD TO 0 CLOUD PROTO
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr  6 10:52:52 OEL_M-SLOT-4 osafimmnd[1646]: NO NODE STATE-> IMM_NODE_ISOLATED


Apr  6 11:00:52 OEL_M-SLOT-4 opensafd[1616]: ER Timed-out for response from 
IMMND
Apr  6 11:00:52 OEL_M-SLOT-4 opensafd[1616]: ER
Apr  6 11:00:52 OEL_M-SLOT-4 opensafd[1616]: ER Going for recovery
Apr  6 11:00:52 OEL_M-SLOT-4 opensafd[1616]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-immnd attempt #1
Apr  6 11:00:52 OEL_M-SLOT-4 opensafd[1616]: ER Sending SIGABRT to IMMND, 
pid=1646, (origin parent pid=1636)
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: Started
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO Fevs count adjusted to 51312 
preLoadPid: 0
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO SETTING COORD TO 0 CLOUD PROTO
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr  6 11:01:07 OEL_M-SLOT-4 osafimmnd[1808]: NO NODE STATE-> IMM_NODE_ISOLATED


Apr  6 11:09:07 OEL_M-SLOT-4 opensafd[1616]: ER Timed-out for response from 
IMMND
Apr  6 11:09:07 OEL_M-SLOT-4 opensafd[1616]: ER Could Not RESPAWN IMMND
Apr  6 11:09:07 OEL_M-SLOT-4 opensafd[1616]: ER
Apr  6 11:09:07 OEL_M-SLOT-4 opensafd[1616]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-immnd attempt #2
Apr  6 11:09:07 OEL_M-SLOT-4 opensafd[1616]: ER Sending SIGABRT to IMMND, 
pid=1808, (origin parent pid=1802)
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: Started
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO SETTING COORD TO 0 CLOUD PROTO
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr  6 11:09:22 OEL_M-SLOT-4 osafimmnd[1976]: NO NODE STATE-> IMM_NODE_ISOLATED


Apr  6 11:10:05 OEL_M-SLOT-4 kernel: imklog 5.8.10, log source = /proc/kmsg 
started.
Apr  6 11:10:05 OEL_M-SLOT-4 rsyslogd: [origin software="rsyslogd" 
swVersion="5.8.10" x-pid="1217" x-info="http://www.rsyslog.com";] start
Apr  6 11:10:05 OEL_M-SLOT-4 kernel: Initializing cgroup subsys cpuset
Apr  6 11:10:05 OEL_M-SLOT-4 kernel: Initializing cgroup subsys cpu
Apr  6 11:10:05 OEL_M-SLOT-4 kernel: Linux version 2.6.32.67 (root@SLOT-1) (gcc 
version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Jul 29 11:31:39 I



-----

When the sync server fails to send sync-finalize message, it keeps waiting for 
the sync-finalize message to come over fevs.

Apr  6  7:55:35.719179 osafimmnd [3728:immnd_proc.c:0840] TR Coord: Sync done, 
but waiting for confirmed finalizeSync. Out queue:0
Apr  6  7:55:35.819531 osafimmnd [3728:immnd_proc.c:0840] TR Coord: Sync done, 
but waiting for confirmed finalizeSync. Out queue:0
Apr  6  7:55:35.919867 osafimmnd [3728:immnd_proc.c:0840] TR Coord: Sync done, 
but waiting for confirmed finalizeSync. Out queue:0
Apr  6  7:55:36.020219 osafimmnd [3728:immnd_proc.c:0840] TR Coord: Sync done, 
but waiting for confirmed finalizeSync. Out queue:0

So it's stuck in IMM_SERVER_SYNC_SERVER state and can't handle new sync request 
from newly joined nodes. That's why PL-4 couldn't join the cluster. The 
coordinator can only announce sync when it's in IMM_SERVER_READY state.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to