[tickets] [opensaf:tickets] #2133 AMF: Rollback admin shutdown/lock SI operation if node failover

2017-02-06 Thread Nagendra Kumar
- **status**: accepted --> review
- **Comment**:

Sent patch for review with the above implementation.



---

** [tickets:#2133] AMF: Rollback admin shutdown/lock SI operation if node 
failover**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Thu Oct 20, 2016 06:49 PM UTC by Minh Hon Chau
**Last Updated:** Thu Feb 02, 2017 09:56 AM UTC
**Owner:** Nagendra Kumar


In scenario of shut down SI, delay QUIESCING csi callback, then reboot the node 
that hosting SU having pending this csi callback. The result of this operation 
looks differently between SGs
- For 2N: the SI Admin state is rollbacked to UNLOCK 
- For Nway: the SI Admin state moves to LOCKED
- In NpM: Haven't tested just browsing SG_NPM::node_fail_si_oper, looks SI 
Admin states rollbacks to UNLOCK

My question is whether the result of these scenario should be consistent? And 
what's the expected outcome?
Also, the handling of node_fail_si_oper for admin lock is not consistent. For 
2N, Admin state remains LOCKED, NpM rollbacks to UNLOCK


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2247 log: Log stream file is not reserved during si-swap

2017-02-06 Thread Vu Minh Nguyen
- **status**: review --> fixed
- **assigned_to**: Canh Truong -->  nobody 
- **Milestone**: 5.2.FC --> 5.0.2
- **Comment**:

changeset:   8559:cf55b10ad97d
tag: tip
parent:  8554:282c0a6e6d8f
user:Canh Van Truong 
date:Mon Jan 09 18:25:23 2017 +0700
summary: log: fix log file is changed during si-swap [#2247]

changeset:   8558:3ed15e1b89f9
branch:  opensaf-5.1.x
parent:  8555:ab8689bfc0ba
user:Canh Van Truong 
date:Fri Dec 30 05:11:55 2016 +0700
summary: log: fix log file is changed during si-swap [#2247]

changeset:   8557:496e112e0515
branch:  opensaf-5.0.x
user:Canh Van Truong 
date:Tue Feb 07 11:22:58 2017 +0700
summary: log: fix log file is changed during si-swap [#2247]




---

** [tickets:#2247] log: Log stream file is not reserved during si-swap**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Wed Dec 28, 2016 10:50 AM UTC by Tai Dinh
**Last Updated:** Tue Jan 03, 2017 07:08 AM UTC
**Owner:** nobody


The fix for #2215 introduce a change in log_stream_open_fileinit(). i.e:
-  if (stream->numOpeners == 0) {
+  if ((stream->numOpeners == 0) || (*stream->p_fd == -1)) {

This will cause an unexpected behaviour during switchover, I guess.
The reason of it is because numOpeners will be checkpointed between the 
activate and the standby one meanwhile p_fd is node locally.
At starting up, both ACTIVE and STANDBY will have numOpeners and p_fd set to 0 
and -1 accordingly.
After that, ACTIVE open the stream, increase the numOpeners and sync this value 
to standby. Also p_fd on ACTIVE will be set to the opened file but this value 
is not checked point to standby.
During switchover, the p_fd of standby will always equal to -1, this cause the 
stream filename will always be changed.

This can be easily reproduce by
- create an application stream
- saflogger to write some messages to it 
- trigger an si-swap on OpenSAF SI.
- write another log to above application stream.
- Check that the stream filename has been rotated.

/Tai


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2289 opensafd (nid): coredump while standby starting

2017-02-06 Thread A V Mahesh (AVM)



---

** [tickets:#2289] opensafd (nid): coredump while standby starting**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Feb 07, 2017 06:31 AM UTC by A V Mahesh (AVM)
**Last Updated:** Tue Feb 07, 2017 06:31 AM UTC
**Owner:** nobody


Restart Standby with TCP , opensafd core dumping


(gdb) bt
/#0  0x7f2f05cb0b55 in raise () from /lib64/libc.so.6
/#1  0x7f2f05cb2131 in abort () from /lib64/libc.so.6
/#2  0x7f2f06704955 in __gnu_cxx::__verbose_terminate_handler() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/vterminate.cc:95
/#3  0x7f2f06702af6 in __cxxabiv1::__terminate(void (*)()) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:38
/#4  0x7f2f06702b23 in std::terminate() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:48
/#5  0x7f2f06702d42 in __cxa_throw () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_throw.cc:87
/#6  0x7f2f0670322d in operator new(unsigned long) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/new_op.cc:56
/#7  0x7f2f06761979 in std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104
#8  0x7f2f0676256b in std::string::_Rep::_M_clone(std::allocator 
const&, unsigned long) () at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:629
#9  0x7f2f06762bec in std::basic_string::basic_string(std::string const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:229
#10 0x7f2f07262c39 in handle_data_request(pollfd*, std::string const&) () 
at /usr/include/c++/4.8.3/bits/basic_string.h:2405
#11 0x7f2f0726320f in svc_monitor_thread(void*) () at 
src/nid/nodeinit.cc:1539
#12 0x7f2f05ff97b6 in start_thread () from /lib64/libpthread.so.0
#13 0x7f2f05d559cd in clone () from /lib64/libc.so.6
#14 0x in ?? ()
(gdb) q



Feb  7 11:41:13 SC-2 opensafd: OpenSAF services successfully stopped
Feb  7 11:41:21 SC-2 opensafd: Starting OpenSAF Services(5.1.M0 - ) (Using TCP)
Feb  7 11:41:21 SC-2 osafdtmd[5329]: mkfifo already exists: 
/var/lib/opensaf/osafdtmd.fifo File exists
Feb  7 11:41:21 SC-2 osafdtmd[5329]: Started
Feb  7 11:41:21 SC-2 osaftransportd[5336]: Started
Feb  7 11:41:21 SC-2 osafclmna[5343]: Started
Feb  7 11:41:21 SC-2 osafrded[5352]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: NO Remote fencing is disabled
Feb  7 11:41:22 SC-2 osafimmd[5371]: Started
Feb  7 11:41:22 SC-2 osafimmd[5371]: NO *** SC_ABSENCE_ALLOWED (Headless 
Hydra) is configured: 900 ***
Feb  7 11:41:22 SC-2 osafimmnd[5382]: Started
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Feb  7 11:41:22 SC-2 opensafd[5318]: NO Monitoring of TRANSPORT started
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO Starting to promote this node to a 
system controller
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Requesting ACTIVE role
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to Undefined
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-3'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'SC-1'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-4'
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Peer up on node 0x2010f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info request from node 0x2010f 
with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info response from node 
0x2010f with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to QUIESCED
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Giving up election against 0x2010f with 
role ACTIVE. My role is now QUIESCED
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO safNode=SC-2,safCluster=myClmCluster 
Joined cluster, nodeid=2020f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Fevs count adjusted to 2835 
preLoadPid: 0
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_ISOLATED
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING 
--> IMM_SERVER_SYNC_CLIENT
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
2926
Feb  7 11:41:23 SC-2 

[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2017-02-06 Thread Minh Hon Chau
Hi Nagu,

There were response emails to the provided log, you can view them in the below 
links also

https://sourceforge.net/p/opensaf/mailman/message/35621982/
https://sourceforge.net/p/opensaf/mailman/message/35636549/

Thanks,
Minh


---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jan 23, 2017 06:42 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2017-02-06 Thread Minh Hon Chau
- **status**: accepted --> review



---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Tue Feb 07, 2017 03:44 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2017-02-06 Thread Minh Hon Chau
Hi Praveen,

I didn't mean reading RTA in normal cluster is causing any problem, I mean loss 
of update for RTA in headless cluster and loss of update MBC in failover in 
normal cluster, both leave incorrect states in AMFD, thanks for your 
explanation anyway.

I think we can create another ticket to continue tracking this issue, if 
possible I hope we can see the issue in the view of finding a common solution 
for both loss of RTA and MBC, it has additional value rather than we just 
solely fix the case of loss of RTA in headless.

Thanks,
Minh


---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Feb 06, 2017 09:46 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2202 cpnd: osafckptnd core dump in high memory load

2017-02-06 Thread A V Mahesh (AVM)
- **status**: review --> fixed



---

** [tickets:#2202] cpnd: osafckptnd core dump in high memory load**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Wed Nov 23, 2016 09:18 AM UTC by Vo Minh Hoang
**Last Updated:** Thu Dec 01, 2016 11:01 AM UTC
**Owner:** Vo Minh Hoang
**Attachments:**

- 
[cpsv_shm_2202.c](https://sourceforge.net/p/opensaf/tickets/2202/attachment/cpsv_shm_2202.c)
 (12.7 kB; application/octet-stream)


Coredump occur while creating checkpoint section in high memory load, sharemem 
guarantee is not enable.

~~~
Core was generated by `/usr/lib64/opensaf/osafckptnd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f38f8513109 in __strtok_r_1c () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install 
opensaf-ckpt-nodedirector-debuginfo-5.1.0-690.0.d0f65c1.sle12.x86_64
(gdb) where
#0  0x7f38f8513109 in __strtok_r_1c () from /lib64/libc.so.6
#1  0x7f38f9fc074a in memcpy (__len=, __src=, 
__dest=) at /usr/include/bits/string3.h:51
#2  ncs_os_posix_shm (req=req@entry=0x7fff5de1f6b0)
at ../../../../../../opensaf/osaf/libs/core/leap/os_defs.c:858
#3  0x00415f6f in cpnd_sec_hdr_update 
(sec_info=sec_info@entry=0x19dc880, 
cp_node=cp_node@entry=0x19dc3e0)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_proc.c:1875
#4  0x0040673a in cpnd_ckpt_sec_add (cp_node=0x19dc3e0, 
id=0x7f38f0008a00, 
exp_time=1478796221720867000, gen_flag=gen_flag@entry=0)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_db.c:456
#5  0x0040d718 in cpnd_evt_proc_ckpt_sect_create 
(cb=cb@entry=0x18337f0, 
evt=evt@entry=0x7f38f000ad80, sinfo=sinfo@entry=0x7f38f000b3d8)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_evt.c:2244
#6  0x0040eff4 in cpnd_process_evt (evt=0x7f38f000ad70)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_evt.c:227
#7  0x00410bcd in cpnd_main_process (cb=cb@entry=0x18337f0)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_init.c:579
#8  0x00405a83 in main (argc=, argv=)
at ../../../../../../../opensaf/osaf/services/saf/cpsv/cpnd/cpnd_main.c:79
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2017-02-06 Thread Praveen
Hi Minh,

AMFD is not required to read any RTA if SC-Absent feature is disabled. RTA is 
for user only. With SC-Absent feature disabled, if AMFD is reading RTA then 
that is a bug and it needs to be fixed. AMFD should read only config data from 
IMM if SC-Absent feature is disabled. For all runtime attributes and objects 
AMFD should depend on its own database when SC-Absent feature is disabled. When 
SC-Absent feature is enabled then also, AMFD should read RTA only once when 
first active AMFD is coming up.

In ticket #2228, I think issue is not MBCSV loss but it may be either an async 
update comes at standby during COLD sync phase and gets ignored or active dies 
before updating SU states to IMM. If it is the first case it  can be fixed by 
not ignoring it. In the second case, it affects only user and AMFD has correct 
states in it database.

Loss of large number of RTAs can surely happen and it will make recovery after 
headless state impossbile in the current implemenration. This is properly 
documented as of now. Please raise a separate enhancement ticket for RTA loss 
problem for 5.3. All these approaches should be discussed under that ticket. 
This tikcet fixes the case highlighted in the description  by adding a small 
change and thus comes into a defect category.


Thanks,
Praveen


---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Thu Feb 02, 2017 04:58 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets