[tickets] [opensaf:tickets] #2296 imm: IMMND on payload crashes after SC absence

2017-02-09 Thread Hung Nguyen
- **status**: accepted --> review



---

** [tickets:#2296] imm: IMMND on payload crashes after SC absence**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Feb 09, 2017 08:44 AM UTC by Hung Nguyen
**Last Updated:** Thu Feb 09, 2017 08:44 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2296/attachment/logs.tgz) 
(5.2 MB; application/x-compressed)


Removal of IMMND coordinator was introduced in [#1692].
Some cleanup actions are delayed until **immnd_proc_server()** is executed.

In case the cluster is back from headless too fast, **immnd_proc_server()** 
will not be executed and IMMND will crashes later.

~~~
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting
2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1'
2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state 
- fevs replies pending:16 fevs highest processed:13154
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD 
service is DOWN
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS 
CONFIGURED => UNREGISTERING IMMND form MDS
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin 
owner with id:41 when imm is not writable state
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f 
sv_id:27
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 
2050f> (OpenSafImmPBE)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 
2010f(down)> (safLogService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 
2010f(down)> (@safLogService_appl)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 
2010f(down)> (@OpenSafImmReplicatorA)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 
2010f(down)> (safClmService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 
2010f(down)> (safAmfService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ...
2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - 
Exiting
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 
2050f01e8 already exist
2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for 
cli_hdl:1464583980303
2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1'
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message 
type 17 - ignoring
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... 
ScAbsenseAllowed?:900 introduced?:2
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 
does not exist
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 
(OpenSafImmPBE) <0, 2040f>
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13157 highestReceived:13158
2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot 
accept start of sync, will terminate
~~~

IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and 
crashed.

~~~
#0  0x7f23733bdc37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 406
selftid = 406
#1  0x7f23733c1028 in __GI_abort () at abort.c:89
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x152d0009, sa_sigaction 
= 0x152d0009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 
139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 
139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 
140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0}
sigs = {__val = {32, 0 }}
#2  0x555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, 
isJoining=false) at 

[tickets] [opensaf:tickets] #2283 ckpt: cpnd_evt_proc_ckpt_open does not reply to ckpt agent

2017-02-09 Thread Zoran Milinkovic
- **status**: review --> fixed
- **Comment**:

opensaf-5.0.x:

changeset:   8568:65fe8f2d1d07
branch:  opensaf-5.0.x
parent:  8563:0e852f85dab8
user:Zoran Milinkovic 
date:Tue Jan 31 13:51:50 2017 +0100
summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in 
checkpoint open call [#2283]

-

opensaf-5.1.x:

changeset:   8569:dbb97b94a174
branch:  opensaf-5.1.x
tag: tip
parent:  8564:50c827d26599
user:Zoran Milinkovic 
date:Tue Jan 31 13:51:50 2017 +0100
summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in 
checkpoint open call [#2283]

-

default(5.2):

changeset:   8567:fb6aea5fe1c9
user:Zoran Milinkovic 
date:Tue Jan 31 13:51:50 2017 +0100
summary: ckpt: return SA_AIS_ERR_BAD_HANDLE if ckpt handle is not found in 
checkpoint open call [#2283]



---

** [tickets:#2283] ckpt: cpnd_evt_proc_ckpt_open does not reply to ckpt agent**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Jan 31, 2017 12:41 PM UTC by Zoran Milinkovic
**Last Updated:** Tue Jan 31, 2017 01:05 PM UTC
**Owner:** Zoran Milinkovic


If a client node is not found in cpnd_evt_proc_ckpt_open(), the function does 
not reply to the agent.
This can cause that checkpoint open may hang for some time and return 
SA_AIS_ERR_TIMEOUT if the client node is not found.

According to the spec, if the checkpoint handle is invalid, the API function 
should reply with SA_AIS_ERR_BAD_HANDLE.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2298 build: Old application binaries may fail to load

2017-02-09 Thread Anders Widell



---

** [tickets:#2298] build: Old application binaries may fail to load**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Thu Feb 09, 2017 01:56 PM UTC by Anders Widell
**Last Updated:** Thu Feb 09, 2017 01:56 PM UTC
**Owner:** Anders Widell


An application binary linked with the OpenSAF AIS libraries may fail to load 
after upgrading to OpenSAF 5.2. The reason is that libopensaf_core.so.0 has 
moved from /usr/local/lib to /usr/local/lib/opensaf. This problem can happen if 
the following two conditions are met:

* The application was linked without using the -Wl,--as-needed option (and this 
option is not enabled by default by the Linux distribution used when building 
the application binary)
* The directory /usr/local/lib/opensaf is not listed in LD_LIBRARY_PATH or 
/etc/ld.so.conf

The suggested fix is to:

* Move libopensaf_core back to /usr/local/lib
* Update the documentation to mention that -Wl,--as-needed must to be used when 
linking with the OpenSAF libraries, to avoid similar problems in the future.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2297 mds: improve MDS logging

2017-02-09 Thread Zoran Milinkovic
- **status**: accepted --> review
- **Comment**:

https://sourceforge.net/p/opensaf/mailman/message/35656997/



---

** [tickets:#2297] mds: improve MDS logging**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Thu Feb 09, 2017 01:15 PM UTC by Zoran Milinkovic
**Last Updated:** Thu Feb 09, 2017 01:15 PM UTC
**Owner:** Zoran Milinkovic


Example:
Nov 17 18:11:44.259636 osafamfd[500] ERR |MDS_SND_RCV: Timeout or Error occured
Nov 17 18:11:44.259779 osafamfd[500] ERR |MDS_SND_RCV: Timeout occured on 
sndrsp message
Nov 17 18:11:44.259817 osafamfd[500] ERR |MDS_SND_RCV: Adest=<0x0002010f,439>

In the first log, it's not obvious if MDS send failed because of timeout or 
because of error.
If MDS send fails beacuse of error, then the second message will still write 
that it's because of the timeout, which is not correct.

The first message should distinguish between timeout or error cases. In error 
case, errno can be printed to make ease debugging.

The second message should be changed from "Timeout occured." to "Timeout or 
error occured."


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2297 mds: improve MDS logging

2017-02-09 Thread Zoran Milinkovic



---

** [tickets:#2297] mds: improve MDS logging**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Thu Feb 09, 2017 01:15 PM UTC by Zoran Milinkovic
**Last Updated:** Thu Feb 09, 2017 01:15 PM UTC
**Owner:** Zoran Milinkovic


Example:
Nov 17 18:11:44.259636 osafamfd[500] ERR |MDS_SND_RCV: Timeout or Error occured
Nov 17 18:11:44.259779 osafamfd[500] ERR |MDS_SND_RCV: Timeout occured on 
sndrsp message
Nov 17 18:11:44.259817 osafamfd[500] ERR |MDS_SND_RCV: Adest=<0x0002010f,439>

In the first log, it's not obvious if MDS send failed because of timeout or 
because of error.
If MDS send fails beacuse of error, then the second message will still write 
that it's because of the timeout, which is not correct.

The first message should distinguish between timeout or error cases. In error 
case, errno can be printed to make ease debugging.

The second message should be changed from "Timeout occured." to "Timeout or 
error occured."


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2290 mds: (TCP) Libraries cause high CPU load when opensaf service stops

2017-02-09 Thread Hans Nordebäck
- **status**: review --> fixed
- **Comment**:

changeset:   8566:7531f6abf2cf
tag: tip
user:Hans Nordeback 
date:Thu Feb 09 13:03:30 2017 +0100
summary: mds: TCP Libraries cause high CPU load when opensaf service stops 
[#2290]





---

** [tickets:#2290] mds: (TCP) Libraries cause high CPU load when opensaf 
service stops**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Tue Feb 07, 2017 11:03 AM UTC by Hung Nguyen
**Last Updated:** Wed Feb 08, 2017 12:34 PM UTC
**Owner:** Hans Nordebäck


When DBSRsock is closed and it returns from mdtm_process_poll_recv_data_tcp()

~~~
:::c
syslog(LOG_ERR, "MDTM:SOCKET recd_bytes :%zd, conn lost with dh server", 
recd_bytes);
close(tcp_cb->DBSRsock);
return;
~~~

the while() loops rapidly because the poll() returns **1** and pfd[0].revents 
is **32 (POLLNVAL 0x020)**

~~~
:::c
pfd[0].fd = tcp_cb->DBSRsock;
pfd[1].fd = tcp_cb->tmr_fd;

while (1) {
int pollres;

pfd[0].events = POLLIN;
pfd[1].events = POLLIN;

pfd[0].revents = pfd[1].revents = 0;

pollres = poll(pfd, 2, MDTM_TCP_POLL_TIMEOUT);

...
}
~~~



-
Reproduce steps:

* run immcfg
~~~
root@SC-1:~# immcfg
>
~~~
* stop opensaf service
~~~
root@SC-1:~# service opensafd stop
~~~
* check the CPU






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1749 amf: Incorrect ER in syslog

2017-02-09 Thread Nagendra Kumar
- **status**: accepted --> review



---

** [tickets:#1749] amf: Incorrect ER in syslog**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Tue Apr 12, 2016 12:22 PM UTC by elunlen
**Last Updated:** Tue Feb 07, 2017 10:06 AM UTC
**Owner:** Nagendra Kumar


When requesting AMF to do a SI swap the following message may appear in the 
syslog:
2016-04-11 17:35:37 SC-1 osafamfd[500]: ER safSi=SC-2N,safApp=OpenSAF SWAP 
failed - Cold sync in progress
This is the case also if the response to the operation is SA_AIS_ERR_TRY_AGAIN 
or SA_AIS_ERR_BUSY.
Getting these responses is not error responses and should not result in an ER 
message in the syslog


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out

2017-02-09 Thread Nagendra Kumar
Also, please share mds.log and complete syslog.


---

** [tickets:#2278] mds: Blocking send causes AMF health check time-out**

**Status:** assigned
**Milestone:** 5.1.1
**Created:** Thu Jan 26, 2017 09:49 AM UTC by Anders Widell
**Last Updated:** Thu Feb 09, 2017 08:31 AM UTC
**Owner:** A V Mahesh (AVM)


AMF health-check time-out is seen on SC-1 after restarting SC-2. The system is 
using OpenSAF 5.1.0 configured with TCP communication.

Syslog:

~~~
2017-01-20T18:29:04.405982+01:00 local0.err SC-1 osafamfnd[2820]: ER AMF 
director heart beat timeout, generating core for amfd
2017-01-20T18:29:05.408819+01:00 local0.crit SC-1 osafamfnd[2820]: Rebooting 
OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, 
OwnNodeId = 131343, SupervisionTime = 0
~~~

Back-trace of osafamfd:

~~~
0x7fa316cceb60 osaf_poll_no_timeout (osaf/libs/core/common/osaf_poll.c:33)
0x7fa316ccede5 osaf_poll (osaf/libs/core/common/osaf_poll.c:45)
0x7fa316ccee25 osaf_poll_one_fd (osaf/libs/core/common/osaf_poll.c:129)
0x7fa316cfab67 mds_mcm_time_wait 
(osaf/libs/core/common/include/osaf_utility.h:79)
0x7fa316cfae51 mds_subtn_tbl_add_disc_queue 
(osaf/libs/core/mds/mds_c_sndrcv.c:1808)
0x7fa316cfb03d mds_mcm_process_disc_queue_checks_redundant 
(osaf/libs/core/mds/mds_c_sndrcv.c:2338)
0x7fa316cfbcd1 mcm_pvt_red_snd_process_common 
(osaf/libs/core/mds/mds_c_sndrcv.c:2257)
0x7fa316cfd04d mcm_pvt_red_svc_snd (osaf/libs/core/mds/mds_c_sndrcv.c:2174)
0x7fa316cff8f9 mds_send (osaf/libs/core/mds/mds_c_sndrcv.c:736)
0x7fa316cf9068 ncsmds_api (osaf/libs/core/mds/mds_papi.c:191)
0x7fa316ce6f5f mbcsv_mds_send_msg (osaf/libs/core/mbcsv/mbcsv_mds.c:239)
0x7fa316cec440 mbcsv_send_ckpt_data_to_all_peers 
(osaf/libs/core/mbcsv/mbcsv_util.c:479)
0x7fa316ce56d7 mbcsv_process_snd_ckpt_request 
(osaf/libs/core/mbcsv/mbcsv_api.c:862)
0x40bfc0 avsv_send_ckpt_data(cl_cb_tag*, unsigned int, unsigned long, unsigned 
int, unsigned int) (osaf/services/saf/amf/amfd/chkop.cc:1062)
0x446649 avd_node_oper_state_set(AVD_AVND*, SaAmfOperationalStateT) 
(osaf/services/saf/amf/amfd/node.cc:505)
0x44040c avd_node_mark_absent(AVD_AVND*) 
(osaf/services/saf/amf/amfd/ndfsm.cc:1018)
0x4438ba avd_node_failover(AVD_AVND*) 
(osaf/services/saf/amf/amfd/ndproc.cc:1141)

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2296 imm: IMMND on payload crashes after SC absence

2017-02-09 Thread Hung Nguyen



---

** [tickets:#2296] imm: IMMND on payload crashes after SC absence**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Thu Feb 09, 2017 08:44 AM UTC by Hung Nguyen
**Last Updated:** Thu Feb 09, 2017 08:44 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2296/attachment/logs.tgz) 
(5.2 MB; application/x-compressed)


Removal of IMMND coordinator was introduced in [#1692].
Some cleanup actions are delayed until **immnd_proc_server()** is executed.

In case the cluster is back from headless too fast, **immnd_proc_server()** 
will not be executed and IMMND will crashes later.

~~~
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting
2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1'
2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state 
- fevs replies pending:16 fevs highest processed:13154
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD 
service is DOWN
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS 
CONFIGURED => UNREGISTERING IMMND form MDS
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin 
owner with id:41 when imm is not writable state
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f 
sv_id:27
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 
2050f> (OpenSafImmPBE)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 
2010f(down)> (safLogService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 
2010f(down)> (@safLogService_appl)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 
2010f(down)> (@OpenSafImmReplicatorA)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 
2010f(down)> (safClmService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 
2010f(down)> (safAmfService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ...
2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - 
Exiting
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 
2050f01e8 already exist
2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for 
cli_hdl:1464583980303
2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1'
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message 
type 17 - ignoring
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... 
ScAbsenseAllowed?:900 introduced?:2
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 
does not exist
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 
(OpenSafImmPBE) <0, 2040f>
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13157 highestReceived:13158
2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot 
accept start of sync, will terminate
~~~

IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and 
crashed.

~~~
#0  0x7f23733bdc37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 406
selftid = 406
#1  0x7f23733c1028 in __GI_abort () at abort.c:89
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x152d0009, sa_sigaction 
= 0x152d0009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 
139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 
139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 
140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0}
sigs = {__val = {32, 0 }}
#2  0x555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, 
isJoining=false) at src/imm/immnd/ImmModel.cc:2637

[tickets] [opensaf:tickets] #2278 mds: Blocking send causes AMF health check time-out

2017-02-09 Thread Nagendra Kumar
Hi Anders, 
Do you see it frequently ? OR 
Do you think this situation comes under some specific test conditions or a 
specific platform ?


---

** [tickets:#2278] mds: Blocking send causes AMF health check time-out**

**Status:** assigned
**Milestone:** 5.1.1
**Created:** Thu Jan 26, 2017 09:49 AM UTC by Anders Widell
**Last Updated:** Thu Feb 02, 2017 03:33 AM UTC
**Owner:** A V Mahesh (AVM)


AMF health-check time-out is seen on SC-1 after restarting SC-2. The system is 
using OpenSAF 5.1.0 configured with TCP communication.

Syslog:

~~~
2017-01-20T18:29:04.405982+01:00 local0.err SC-1 osafamfnd[2820]: ER AMF 
director heart beat timeout, generating core for amfd
2017-01-20T18:29:05.408819+01:00 local0.crit SC-1 osafamfnd[2820]: Rebooting 
OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, 
OwnNodeId = 131343, SupervisionTime = 0
~~~

Back-trace of osafamfd:

~~~
0x7fa316cceb60 osaf_poll_no_timeout (osaf/libs/core/common/osaf_poll.c:33)
0x7fa316ccede5 osaf_poll (osaf/libs/core/common/osaf_poll.c:45)
0x7fa316ccee25 osaf_poll_one_fd (osaf/libs/core/common/osaf_poll.c:129)
0x7fa316cfab67 mds_mcm_time_wait 
(osaf/libs/core/common/include/osaf_utility.h:79)
0x7fa316cfae51 mds_subtn_tbl_add_disc_queue 
(osaf/libs/core/mds/mds_c_sndrcv.c:1808)
0x7fa316cfb03d mds_mcm_process_disc_queue_checks_redundant 
(osaf/libs/core/mds/mds_c_sndrcv.c:2338)
0x7fa316cfbcd1 mcm_pvt_red_snd_process_common 
(osaf/libs/core/mds/mds_c_sndrcv.c:2257)
0x7fa316cfd04d mcm_pvt_red_svc_snd (osaf/libs/core/mds/mds_c_sndrcv.c:2174)
0x7fa316cff8f9 mds_send (osaf/libs/core/mds/mds_c_sndrcv.c:736)
0x7fa316cf9068 ncsmds_api (osaf/libs/core/mds/mds_papi.c:191)
0x7fa316ce6f5f mbcsv_mds_send_msg (osaf/libs/core/mbcsv/mbcsv_mds.c:239)
0x7fa316cec440 mbcsv_send_ckpt_data_to_all_peers 
(osaf/libs/core/mbcsv/mbcsv_util.c:479)
0x7fa316ce56d7 mbcsv_process_snd_ckpt_request 
(osaf/libs/core/mbcsv/mbcsv_api.c:862)
0x40bfc0 avsv_send_ckpt_data(cl_cb_tag*, unsigned int, unsigned long, unsigned 
int, unsigned int) (osaf/services/saf/amf/amfd/chkop.cc:1062)
0x446649 avd_node_oper_state_set(AVD_AVND*, SaAmfOperationalStateT) 
(osaf/services/saf/amf/amfd/node.cc:505)
0x44040c avd_node_mark_absent(AVD_AVND*) 
(osaf/services/saf/amf/amfd/ndfsm.cc:1018)
0x4438ba avd_node_failover(AVD_AVND*) 
(osaf/services/saf/amf/amfd/ndproc.cc:1141)

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets