- **status**: review --> fixed
- **Comment**:

default (5.2) [staging:21aab7]
changeset:   8593:21aab7e03190
user:        Hung Nguyen <hung.d.ngu...@dektech.com.au>
date:        Tue Feb 21 14:46:41 2017 +0700
summary:     imm: Fix problems with removing coordinator role when cluster goes 
headless [#2296]

opensaf-5.1.x [staging:15aceb]
changeset:   8594:15aceb2ce9dd
user:        Hung Nguyen <hung.d.ngu...@dektech.com.au>
date:        Tue Feb 21 14:49:28 2017 +0700
summary:     imm: Fix problems with removing coordinator role when cluster goes 
headless [#2296]

opensaf-5.0.x [staging:78b886]
changeset:   8595:78b886a029c4
user:        Hung Nguyen <hung.d.ngu...@dektech.com.au>
date:        Tue Feb 21 14:49:28 2017 +0700
summary:     imm: Fix problems with removing coordinator role when cluster goes 
headless [#2296]




---

** [tickets:#2296] imm: IMMND on payload crashes after SC absence**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Thu Feb 09, 2017 08:44 AM UTC by Hung Nguyen
**Last Updated:** Fri Feb 10, 2017 07:27 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2296/attachment/logs.tgz) 
(5.2 MB; application/x-compressed)


Removal of IMMND coordinator was introduced in [#1692].
Some cleanup actions are delayed until **immnd_proc_server()** is executed.

In case the cluster is back from headless too fast, **immnd_proc_server()** 
will not be executed and IMMND will crashes later.

~~~
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting
2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1'
2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state 
- fevs replies pending:16 fevs highest processed:13154
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD 
service is DOWN
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS 
CONFIGURED => UNREGISTERING IMMND form MDS
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin 
owner with id:41 when imm is not writable state
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f 
sv_id:27
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 
2050f> (OpenSafImmPBE)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f 
sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 
2010f(down)> (safLogService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 
2010f(down)> (@safLogService_appl)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 
2010f(down)> (@OpenSafImmReplicatorA)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 
2010f(down)> (safClmService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 
2010f(down)> (safAmfService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ...
2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - 
Exiting
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH 
MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 
2050f000001e8 already exist
2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for 
cli_hdl:1464583980303
2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1'
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message 
type 17 - ignoring
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... 
ScAbsenseAllowed?:900 introduced?:2
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 
does not exist
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 
(OpenSafImmPBE) <0, 2040f>
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me 
highestProcessed:13157 highestReceived:13158
2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot 
accept start of sync, will terminate
~~~

IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and 
crashed.

~~~
#0  0x00007f23733bdc37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 406
        selftid = 406
#1  0x00007f23733c1028 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x152d00000009, sa_sigaction 
= 0x152d00000009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 
139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 
139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 
140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, 
isJoining=false) at src/imm/immnd/ImmModel.cc:2637
        __FUNCTION__ = "prepareForSync"
#3  0x0000555ec6caa696 in immModel_prepareForSync (cb=0x555ec6ff8a60 
<_immnd_cb>, isJoining=false) at src/imm/immnd/ImmModel.cc:2193
No locals.
#4  0x0000555ec6c8373e in immnd_evt_proc_start_sync (cb=0x555ec6ff8a60 
<_immnd_cb>, evt=0x7f236c002990, sinfo=0x7f236c002ad0) at 
src/imm/immnd/immnd_evt.c:8739
        __FUNCTION__ = "immnd_evt_proc_start_sync"
#5  0x0000555ec6c61d01 in immnd_process_evt () at src/imm/immnd/immnd_evt.c:666
        cb = 0x555ec6ff8a60 <_immnd_cb>
        rc = 1
        evt = 0x7f236c002980
        __FUNCTION__ = "immnd_process_evt"
#6  0x0000555ec6c8cc1c in main (argc=1, argv=0x7ffd57cc9698) at 
src/imm/immnd/immnd_main.c:369
        wasCoord = 0 '\000'
        now = {tv_sec = 897603, tv_nsec = 56765584}
        passed_time = {tv_sec = 7, tv_nsec = 104432087}
        passed_time_ms = 7104
        ret = 1
        mbx_fd = {raise_obj = 12, rmv_obj = 13}
        error = SA_AIS_OK
        timeout = 100
        eventCount = 13
        maxEvt = 50
        start_time = {tv_sec = 897595, tv_nsec = 952333497}
        fds = {{fd = 17, events = 1, revents = 0}, {fd = 15, events = 1, 
revents = 0}, {fd = 13, events = 1, revents = 1}}
        term_fd = 17
        __FUNCTION__ = "main"
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to