[tickets] [opensaf:tickets] #1847 log: lgs_imm_init_configStreams: log_stream_open_file_restore Fail is observed in syslog.

2016-05-31 Thread Madhurika Koppula
Attaching the messages of osaflogd during bringup of active controller with 
headless enabled. 


Attachments:

- 
[osaflogd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/353fbcfa/6016/attachment/osaflogd)
 (120.0 kB; application/octet-stream)


---

** [tickets:#1847] log: lgs_imm_init_configStreams: 
log_stream_open_file_restore Fail is observed in syslog.**

**Status:** review
**Milestone:** 5.0.1
**Created:** Tue May 24, 2016 07:32 AM UTC by Madhurika Koppula
**Last Updated:** Mon May 30, 2016 02:26 PM UTC
**Owner:** Vu Minh Nguyen


Setup details: SUSE (32bit-64bit combination) 5 nodes cluster with single PBE 

1) Brought up cluster with Active SC, Standby SC, Spare SC and two payloads 
PL-4 and PL-5 with headless feature enabled.
2) During startup of Active controller, lgs_imm_init_configStreams: 
log_stream_open_file_restore Fail is observed in syslog.

Below is the snippet of active controller syslog:

May 23 14:28:29 REG-S1 osaffmd[8814]: Started
May 23 14:28:29 REG-S1 osafimmd[8824]: Started
May 23 14:28:29 REG-S1 osafimmd[8824]: NO *** SC_ABSENCE_ALLOWED (Headless 
Hydra) is configured: 900 ***
May 23 14:28:29 REG-S1 osafimmnd[8835]: Started


May 23 14:28:34 REG-S1 osafimmnd[8835]: NO Implementer connected: 1 
(safLogService) <2, 2010f>
May 23 14:28:34 REG-S1 osafimmnd[8835]: NO implementer for class 
'OpenSafLogConfig' is safLogService => class extent is safe.
May 23 14:28:34 REG-S1 osafimmnd[8835]: NO implementer for class 
'SaLogStreamConfig' is safLogService => class extent is safe.

May 23 14:28:34 REG-S1 osaflogd[8850]: NO lgs_imm_init_configStreams: 
log_stream_open_file_restore Fail
May 23 14:28:34 REG-S1 osaflogd[8850]: NO lgs_imm_init_configStreams: 
log_stream_open_file_restore Fail
May 23 14:28:34 REG-S1 osaflogd[8850]: NO lgs_imm_init_configStreams: 
log_stream_open_file_restore Fail


May 23 14:28:34 REG-S1 osafimmnd[8835]: NO Implementer (applier) connected: 2 
(@safLogService_appl) <16, 2010f>
May 23 14:28:34 REG-S1 osafntfd[8862]: Started



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1825 MDS: When saImmOmInitialize was invoked, MDS sent ‘pid = 0’ in MDS_CALLBACK_RECEIVE_INFO.

2016-05-31 Thread A V Mahesh (AVM)
- **status**: assigned --> review



---

** [tickets:#1825] MDS: When saImmOmInitialize was invoked, MDS sent ‘pid = 0’ 
in MDS_CALLBACK_RECEIVE_INFO.**

**Status:** review
**Milestone:** 4.7.2
**Created:** Fri May 13, 2016 12:34 PM UTC by Rafael
**Last Updated:** Thu May 26, 2016 04:46 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[sc2_mds.log](https://sourceforge.net/p/opensaf/tickets/1825/attachment/sc2_mds.log)
 (2.6 MB; text/x-log)


This happens only some of the time. 

First this is called from an application:

saImmOmInitialize(immHandle, NULL, &version)

Then the system log shows this error:

May 13 02:31:53 sc2 osafimmnd[6042]: WA immnd_evt_proc_imm_init: PID 0 
(7150) for 2020fb1d13cb8, MDS problem?
 May 13 02:31:53 sc2 osafimmnd[6042]: WA immnd_evt_proc_imm_init: PID 0 
(7150) for 2020fb1d13cb8, MDS problem?
 May 13 02:31:55 sc2 osafimmnd[6042]: NO Ccb 176 COMMITTED (LDE)
 May 13 02:31:55 sc2 osafimmnd[6042]: WA immnd_evt_proc_imm_init: PID 0 
(7150) for 2020fb1d13cb8, MDS problem?
 May 13 02:31:55 sc2 osafimmnd[6042]: WA immnd_evt_proc_imm_init: PID 0 
(7150) for 2020fb1d13cb8, MDS problem?


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1848 imm: Memory leak in ImmSearchOp::nextResult

2016-05-31 Thread Hung Nguyen
- **status**: review --> fixed
- **Comment**:

default (5.1) [staging:2791b4]
changeset:   7686:2791b4c274dc
user:Hung Nguyen 
date:Thu May 26 13:54:43 2016 +0700
summary: imm: Free mLastResult of SearchOp when discarding the client 
[#1848]

opensaf-5.0.x [staging:556a6d]
changeset:   7687:556a6d37cc04
user:Hung Nguyen 
date:Thu May 26 13:54:43 2016 +0700
summary: imm: Free mLastResult of SearchOp when discarding the client 
[#1848]

opensaf-4.7.x [staging:d07700]
changeset:   7688:d0770045d67d
user:Hung Nguyen 
date:Thu May 26 13:54:43 2016 +0700
summary: imm: Free mLastResult of SearchOp when discarding the client 
[#1848]




---

** [tickets:#1848] imm: Memory leak in ImmSearchOp::nextResult**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Tue May 24, 2016 10:12 AM UTC by Hung Nguyen
**Last Updated:** Thu May 26, 2016 06:59 AM UTC
**Owner:** Hung Nguyen


Reproduce:

Use immlist on an object with runtime attributes and OI attached.
Kill immlist before receiving the response from OI.


~~~
root@SC-1:~# immlist safSu=SC-1,safSg=NoRed,safApp=OpenSAF & kill -9 `pidof 
immlist`
~~~



~~~
==23745== 644 (24 direct, 620 indirect) bytes in 1 blocks are definitely lost 
in loss record 154 of 208
==23745==at 0x4C2B200: calloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23745==by 0x42104E: ImmSearchOp::nextResult(ImmsvOmRspSearchNext**, 
void**, std::list >**) 
(ImmSearchOp.cc:124)
==23745==by 0x42AD55: immModel_nextResult (ImmModel.cc:1514)
==23745==by 0x40E216: immnd_evt_proc_search_next (immnd_evt.c:1526)
==23745==by 0x40EA86: immnd_evt_proc_accessor_get (immnd_evt.c:1861)
==23745==by 0x4187C2: immnd_process_evt (immnd_evt.c:609)
==23745==by 0x40B585: main (immnd_main.c:348)
~~~

---

Analysis:
The problem is 'mLastResult' is not freed when discarding the client node.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT consensus algorithm

2016-05-31 Thread Mathi Naickan
- Description has changed:

Diff:



--- old
+++ new
@@ -49,6 +49,8 @@
 (In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
 The name of the library shall be libOsafClusterServices.so)
 
+* OpenSAF should work both when RAFT is enabled or disabled on that system and 
should be backward compatible to previous OpenSAF releases!
+
 The CS library shall provide a normalized set of APIs (and callback 
interfaces) such that OpenSAF can interact with different implementations of 
RAFT. 
 
 API and High level design details to follow:






---

** [tickets:#439] Enhanced cluster management using RAFT consensus algorithm**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 11:25 AM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
-classic 2 SC OpenSAF cluster (or)
-when all nodes are SCs (2N + the rest are all spares) (or)
-2N + spare SCs (2N + a smaller subset are spares) (or)
-N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

* OpenSAF should work both when RAFT is enabled or disabled on that system and 
should be backward compatible to previous OpenSAF releases!

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tick

[tickets] [opensaf:tickets] #1759 amf: Command based HealthCheck invoked during termination phase (NPI comp).

2016-05-31 Thread Minh Hon Chau
Hi Praveen,

This problem has actually being seen in scale-in scenario, which is a npi su is 
getting TERMINATE. I have tried many times to reproduce but I was not able to 
see it.
Most of times I see is HC command response comes after timer expiry (which is 
already fixed by patch of #1759), or the timer is stopped when amfnd run 
component termination.
What I can think for now is the case described above, it happens in very small 
timing window, the timer has expired in timer thread just after amfnd starts 
component termination, provided that hc command response also finishes just in 
time (or before) timer expiry.
I don't think I'm lucky to see it, but it can happen and it's a bug, because 
component termination happens before timer expiry anyway. To fix it, I think we 
can do similiar check as patch of #1759 before run clc hc command in 
avnd_comp_hc_rec_tmr_exp(). 
I will ask for amfnd trace if this problem happens again in scale-in scenario. 
But this case  should be fixed anyway?

Thanks,
Minh


---

** [tickets:#1759] amf: Command based HealthCheck invoked during termination 
phase (NPI comp).**

**Status:** fixed
**Milestone:** 4.6.2
**Labels:** Command Based Health Check NPI comp 
**Created:** Thu Apr 14, 2016 06:24 AM UTC by Praveen
**Last Updated:** Tue May 31, 2016 07:35 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/1759/attachment/osafamfnd)
 (2.1 MB; application/octet-stream)


Attached are amfnd traces and a configuration to reproduce.

steps to reproduce.
1)Bring the configuration up on a single controller.
2)Command based health check for the component is invoked by AMFND periodically.
3)Lock the SU.
4)As a part of quiesced assignment, AMFND stops all the PM monitoring (if 
configured) or Command based health check and starts terminating the component 
by invoking terminating script.
5)During this termination phase, health check command completes and AMFND 
restarts the timer which should not be started. (AMFND has stopped already 
before starting termination of comp).
6)Before termination of comp completes(script has 2 seconds sleep), health 
check timer expires and amfnd invokes health check command.


>From traces:
1) AMFND invokes health check command:
Apr 13 14:25:06.698503 osafamfnd [7387:chc.cc:0914] >> 
avnd_comp_hc_rec_tmr_exp: 
safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 - osafHealthCheck, sts: 0
Apr 13 14:25:06.698516 osafamfnd [7387:clc.cc:2757] >> 
avnd_comp_clc_cmd_execute: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':CLC CLI command 
type:'AVND_COMP_CLC_CMD_TYPE_HC(6)'
Apr 13 14:25:06.698532 osafamfnd [7387:clc.cc:2868] T1 Component is NPI, 1
Apr 13 14:25:06.698547 osafamfnd [7387:clc.cc:2920] T1 CLC CLI 
script:'/opt/amf_demo/npi/pm/amf_demo_monitor.sh'
Apr 13 14:25:06.698557 osafamfnd [7387:clc.cc:2925] T1 CLC CLI command timeout: 
In nano secs:1800 In milli secs: 18
Apr 13 14:25:06.698568 osafamfnd [7387:clc.cc:2929] T1 CLC CLI command env 
variable name = 'SA_AMF_COMPONENT_NAME': value 
='safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:06.698578 osafamfnd [7387:clc.cc:2929] T1 CLC CLI command env 
variable name = 'SA_AMF_COMPONENT_NAME': value 
='safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:06.699016 osafamfnd [7387:clc.cc:2961] T2 The CLC CLI command 
execution success

2)AMFND gets quiesced assignments as a part of lock operation. It stops health 
check timer and starts terminating comp: 
Apr 13 14:25:07.397472 osafamfnd [7387:su.cc:0376] >> 
avnd_evt_avd_info_su_si_assign_evh: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397484 osafamfnd [7387:susm.cc:0189] >> avnd_su_siq_rec_buf: 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'

Apr 13 14:25:07.397731 osafamfnd [7387:cpm.cc:0634] >> avnd_comp_pm_finalize: 
Comp 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397739 osafamfnd [7387:cpm.cc:0650] << avnd_comp_pm_finalize
Apr 13 14:25:07.397748 osafamfnd [7387:chc.cc:0761] >> find_hc_rec: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397758 osafamfnd [7387:tmr.cc:0126] TR health check timer 
stopped
Apr 13 14:25:07.397767 osafamfnd [7387:clc.cc:0854] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':Entering CLC FSM: 
presence state:'SA_AMF_PRESENCE_INSTANTIATED(3)', 
Event:'AVND_COMP_CLC_PRES_FSM_EV_TERM'
Apr 13 14:25:07.397776 osafamfnd [7387:clc.cc:1852] >> 
avnd_comp_clc_inst_term_hdler: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1': Terminate event in 
the Instantiated state
Apr 13 14:25:07.397797 osafamfnd [7387:clc.cc:1876] NO Terminate comann 
executing
Apr 13 14:25:07.397807 osafamfnd [7387:clc.cc:2757] >> 
avnd_comp_clc_cmd_execute: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':CLC CLI command 
type:'AVND_COMP_CLC_CMD_TYPE_TERMINATE(2)'
Apr 13 14:25:07.397824 osafamfnd [7387:clc.cc:2868] T1 Component is NPI, 1
Apr 1

[tickets] [opensaf:tickets] #1855 imm: Implementer is marked as dying forever when IMMND fails to send discard msg

2016-05-31 Thread Hung Nguyen
- **status**: accepted --> review



---

** [tickets:#1855] imm: Implementer is marked as dying forever when IMMND fails 
to send discard msg**

**Status:** review
**Milestone:** 4.7.2
**Created:** Mon May 30, 2016 07:06 AM UTC by Hung Nguyen
**Last Updated:** Mon May 30, 2016 09:07 AM UTC
**Owner:** Hung Nguyen


When discarding a client connection, if IMMND fails to send the implementer 
discard message, D2ND_DISCARD_IMPL will not be broadcasted back to all IMMNDs.
The implementer will be marked as dying on local node. Also the remote nodes 
are not aware of that, so they still see implementer as connected.

~~~
13:21:54 SC-1 osafimmnd[433]: ER Discard implementer failed for implId:5 but 
IMMD is up !? - case not handled. Client will be orphanded
13:21:54 SC-1 osafimmnd[433]: NO Implementer locally disconnected. Marking it 
as doomed 5 <191, 2010f>
~~~

IMMND currently doesn't handle that case so the implementer is stuck in dying 
state.
Any attempt to set to that implementer will be rejected with TRY_AGAIN (see 
immModel_implIsFree).
The OI will fail to set implementer no matter how many times it retires.

A node reboot is needed to recover from this situation.

This can also happen if IMMD crashes when processing the implementer-discard 
message (crashing before immd_mbcsv_sync_update). This is very hard to happen 
though.
If IMMD crashes after immd_mbcsv_sync_update, it's still safe because the 
standby IMMD will re-broadcast the fevs messages after failing over.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT consensus algorithm

2016-05-31 Thread Mathi Naickan
- Description has changed:

Diff:



--- old
+++ new
@@ -28,10 +28,10 @@
 (b) quorum based leader election
 (c) split brain avoidance
 The following deployment scenarios shall be supported when using RAFT:
-- classic 2 SC OpenSAF cluster (or)
-- when all nodes are SCs (2N + the rest are all spares) (or)
-- 2N + spare SCs (2N + a smaller subset are spares) (or)
-- N-WAY (a active, the rest are all hot standbys) - 5.2
+-classic 2 SC OpenSAF cluster (or)
+-when all nodes are SCs (2N + the rest are all spares) (or)
+-2N + spare SCs (2N + a smaller subset are spares) (or)
+-N-WAY (a active, the rest are all hot standbys) - 5.2
 Note: A mix of hot standbys and spares should also be possible.
 
 






---

** [tickets:#439] Enhanced cluster management using RAFT consensus algorithm**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 11:03 AM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
-classic 2 SC OpenSAF cluster (or)
-when all nodes are SCs (2N + the rest are all spares) (or)
-2N + spare SCs (2N + a smaller subset are spares) (or)
-N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT consensus algorithm

2016-05-31 Thread Mathi Naickan
- **summary**: Enhanced cluster management using RAFT --> Enhanced cluster 
management using RAFT consensus algorithm



---

** [tickets:#439] Enhanced cluster management using RAFT consensus algorithm**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 11:01 AM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
- classic 2 SC OpenSAF cluster (or)
- when all nodes are SCs (2N + the rest are all spares) (or)
- 2N + spare SCs (2N + a smaller subset are spares) (or)
- N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT

2016-05-31 Thread Mathi Naickan
- **Comment**:

API and high level design details to follow soon.



---

** [tickets:#439] Enhanced cluster management using RAFT**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 10:59 AM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
- classic 2 SC OpenSAF cluster (or)
- when all nodes are SCs (2N + the rest are all spares) (or)
- 2N + spare SCs (2N + a smaller subset are spares) (or)
- N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT

2016-05-31 Thread Mathi Naickan
This provides a gist of how the OpenSAF startup would like like after 
introducing RAFT and CS/TS.
![](http://)


Attachments:

- 
[osaf-raft-ts-startup-seq.png](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/3db919bc/270f/attachment/osaf-raft-ts-startup-seq.png)
 (66.1 kB; image/png)


---

** [tickets:#439] Enhanced cluster management using RAFT**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Tue May 31, 2016 10:52 AM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)

(2) As a second step, implement (this ticket in 5.1)  - 
Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
- current cluster members
- the current active SC, leader election
- the order of member nodes joining/leaving the cluster


(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 
5.2?)


This ticket addresses bullet (2) above.

Requirements:

* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
- classic 2 SC OpenSAF cluster (or)
- when all nodes are SCs (2N + the rest are all spares) (or)
- 2N + spare SCs (2N + a smaller subset are spares) (or)
- N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.


* RAFT shall be a added as a new OpenSAF service. 

* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.

* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership

* CLM is the single layer that interfaces with the underlying RAFT and TS

* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.

* CS and TS shall be added as libraries of OpenSAF CLM service. 
(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)

The CS library shall provide a normalized set of APIs (and callback interfaces) 
such that OpenSAF can interact with different implementations of RAFT. 

API and High level design details to follow:


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #439 Enhanced cluster management using RAFT

2016-05-31 Thread Mathi Naickan
- **summary**: Enhanced cluster management --> Enhanced cluster management 
using RAFT
- Description has changed:

Diff:



--- old
+++ new
@@ -1,19 +1,54 @@
-The purpose of this ticket is to achieve the following enhancements to OpenSAF 
cluster management/membership:
+The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):
 
-- Perform high level node monitoring(heartbeating)
-- Enhanced split-brain avoidance techniques.
-RAFT is being considered for implementing the above cluster management 
enhancements.
-. 
-The scope of this ticket includes the following:
+Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
+- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
+- Relying on 3rd party OR less reliable - hardware/network/hosts
+- Dynamically changing cluster membership due to scale-out and scale-in 
operations
+- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and cluster partitioning.
 
-(a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for
-- adding/removing nodes to the cluster membership
-- querying leader
-- callbacks notifying about new leader
-- read/write interface
-- notification of nodes joining/leaving the cluster membership
-Note: Yet to be seen if a leader yield interface is necessary
+These requirements are being addressed in a phased manner.
+(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was 
implemented in 5.0. (And the headless cluster feature)
 
-(b) an interface that alows invoking a fencing mechanism
+(2) As a second step, implement (this ticket in 5.1)  - 
+Enhanced OpenSAF cluster management such that there is always consensus (among 
the cluster nodes) on the 
+- current cluster members
+- the current active SC, leader election
+- the order of member nodes joining/leaving the cluster
 
-(c) an interface that allows invoking an arbitration mechanism
+
+(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ 
in 5.2?)
+
+
+This ticket addresses bullet (2) above.
+
+Requirements:
+
+* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as 
the mechanism for 
+(a) achieving consensus among a set of the cluster nodes (and the membership 
changes)
+(b) quorum based leader election
+(c) split brain avoidance
+The following deployment scenarios shall be supported when using RAFT:
+- classic 2 SC OpenSAF cluster (or)
+- when all nodes are SCs (2N + the rest are all spares) (or)
+- 2N + spare SCs (2N + a smaller subset are spares) (or)
+- N-WAY (a active, the rest are all hot standbys) - 5.2
+Note: A mix of hot standbys and spares should also be possible.
+
+
+* RAFT shall be a added as a new OpenSAF service. 
+
+* OpenSAF shall either implement RAFT or re-use existing RAFT implementations 
like logcabin or etcd, etc.
+
+* A new topology service(TS) *may* be added which shall use the topology 
information (from TIPC) and MDS (in case of TCP) to determine cluster membership
+
+* CLM is the single layer that interfaces with the underlying RAFT and TS
+
+* All interactions to RAFT and TS shall be via the normalised cluster services 
adaptation interface called as OpenSAF cluster services library (CS).  The CS 
library thereby shall enable OpenSAF to work with different implementations of 
RAFT.
+
+* CS and TS shall be added as libraries of OpenSAF CLM service. 
+(In the code structure, these shall be part of services/saf/clm/libcs and 
services/saf/clm/libts.
+The name of the library shall be libOsafClusterServices.so)
+
+The CS library shall provide a normalized set of APIs (and callback 
interfaces) such that OpenSAF can interact with different implementations of 
RAFT. 
+
+API and High level design details to follow:






---

** [tickets:#439] Enhanced cluster management using RAFT**

**Status:** accepted
**Milestone:** 5.1.FC
**Labels:** #79 #1170 
**Created:** Fri May 31, 2013 11:15 AM UTC by Mathi Naickan
**Last Updated:** Mon Apr 11, 2016 10:14 PM UTC
**Owner:** Mathi Naickan


The goal of this ticket is to address the following requirements.This ticket 
should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple 
standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following 
challenges:
- Multiple nodes failing/faulting simultaneously (either in a cattle class 
deployment OR the host machine going down which inturn will pull down the guest 
VM nodes)
- Relying on 3rd party OR less reliable - hardware/network/hosts
- Dynamically changing cluster membership due to scale-out and scale-in 
operations
- Multiple (or all) nodes can now become system controller nodes. This 
increases the probability of split brain and clu

[tickets] [opensaf:tickets] #1856 Add IMMSv Readme for Headless feature

2016-05-31 Thread Chani Srivastava



---

** [tickets:#1856] Add IMMSv Readme for Headless feature**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Tue May 31, 2016 09:21 AM UTC by Chani Srivastava
**Last Updated:** Tue May 31, 2016 09:21 AM UTC
**Owner:** nobody


Like other services(amf, log), there is no IMMSv Readme for Headless feature.

Currently, readme only describes the SC_ABSENCE feature, but how the immsv 
service is expected to behave in the absence of controllers is not documented.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1546 AMF : Lock of node should be allowed similar to ng, if more than one SU is hosted

2016-05-31 Thread Praveen
- **Milestone**: future --> 4.7.2



---

** [tickets:#1546] AMF : Lock of node should be allowed similar to ng, if more 
than one SU is hosted**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Oct 15, 2015 03:38 AM UTC by Srikanth R
**Last Updated:** Fri Apr 29, 2016 09:16 AM UTC
**Owner:** nobody


Changeset : 6901


  currently for 2N, lock of node group is allowed, if more than 1 SU is hosted 
on the member node of node group. But lock of node is not allowed, if more than 
1 SU is hosted.  
  
  
  # amf-adm lock safAmfNode=SC-2,safAmfCluster=myAmfCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_NOT_SUPPORTED (19)
error-string: Node lock/shutdown not allowed with two SUs on same node


#amf-adm lock safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster
   safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safAmfNode=SC-1,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safAmfNode=SC-2,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safSi=TWONSI5,safApp=TWONAPP
  FULLYASSIGNED --> PARTIALLYASSIGNED
   safSi=TWONSI5,safApp=TWONAPP
  Alarm MAJOR



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1759 amf: Command based HealthCheck invoked during termination phase (NPI comp).

2016-05-31 Thread Praveen
Hi Minh,

Yes that can happen but that is a very remote possiblity timing wise. Remote 
possibility because AMFND stops any running HC timer when it launches terminate 
command  amd HC command is not invoked always after each timer expiry. This is 
in this sense different from AMF spec based health checks. After each timer 
expiry if command is in execution then Health Check will not be invoked and 
AMFND waits for the completion of script. When HC commands status comes to 
AMNFD and it sees that is already expired then it restarts timer instead of 
invking the HC command.
So the fix provided in this ticket and the way HC work, it becomes a very 
remote case. 
Are you seeing invocation of HC command in the context of this case ?If 
possible please share amfnd traces.

Thanks,
Praveen





---

** [tickets:#1759] amf: Command based HealthCheck invoked during termination 
phase (NPI comp).**

**Status:** fixed
**Milestone:** 4.6.2
**Labels:** Command Based Health Check NPI comp 
**Created:** Thu Apr 14, 2016 06:24 AM UTC by Praveen
**Last Updated:** Mon May 30, 2016 02:12 PM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/1759/attachment/osafamfnd)
 (2.1 MB; application/octet-stream)


Attached are amfnd traces and a configuration to reproduce.

steps to reproduce.
1)Bring the configuration up on a single controller.
2)Command based health check for the component is invoked by AMFND periodically.
3)Lock the SU.
4)As a part of quiesced assignment, AMFND stops all the PM monitoring (if 
configured) or Command based health check and starts terminating the component 
by invoking terminating script.
5)During this termination phase, health check command completes and AMFND 
restarts the timer which should not be started. (AMFND has stopped already 
before starting termination of comp).
6)Before termination of comp completes(script has 2 seconds sleep), health 
check timer expires and amfnd invokes health check command.


>From traces:
1) AMFND invokes health check command:
Apr 13 14:25:06.698503 osafamfnd [7387:chc.cc:0914] >> 
avnd_comp_hc_rec_tmr_exp: 
safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 - osafHealthCheck, sts: 0
Apr 13 14:25:06.698516 osafamfnd [7387:clc.cc:2757] >> 
avnd_comp_clc_cmd_execute: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':CLC CLI command 
type:'AVND_COMP_CLC_CMD_TYPE_HC(6)'
Apr 13 14:25:06.698532 osafamfnd [7387:clc.cc:2868] T1 Component is NPI, 1
Apr 13 14:25:06.698547 osafamfnd [7387:clc.cc:2920] T1 CLC CLI 
script:'/opt/amf_demo/npi/pm/amf_demo_monitor.sh'
Apr 13 14:25:06.698557 osafamfnd [7387:clc.cc:2925] T1 CLC CLI command timeout: 
In nano secs:1800 In milli secs: 18
Apr 13 14:25:06.698568 osafamfnd [7387:clc.cc:2929] T1 CLC CLI command env 
variable name = 'SA_AMF_COMPONENT_NAME': value 
='safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:06.698578 osafamfnd [7387:clc.cc:2929] T1 CLC CLI command env 
variable name = 'SA_AMF_COMPONENT_NAME': value 
='safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:06.699016 osafamfnd [7387:clc.cc:2961] T2 The CLC CLI command 
execution success

2)AMFND gets quiesced assignments as a part of lock operation. It stops health 
check timer and starts terminating comp: 
Apr 13 14:25:07.397472 osafamfnd [7387:su.cc:0376] >> 
avnd_evt_avd_info_su_si_assign_evh: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397484 osafamfnd [7387:susm.cc:0189] >> avnd_su_siq_rec_buf: 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'

Apr 13 14:25:07.397731 osafamfnd [7387:cpm.cc:0634] >> avnd_comp_pm_finalize: 
Comp 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397739 osafamfnd [7387:cpm.cc:0650] << avnd_comp_pm_finalize
Apr 13 14:25:07.397748 osafamfnd [7387:chc.cc:0761] >> find_hc_rec: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Apr 13 14:25:07.397758 osafamfnd [7387:tmr.cc:0126] TR health check timer 
stopped
Apr 13 14:25:07.397767 osafamfnd [7387:clc.cc:0854] T1 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':Entering CLC FSM: 
presence state:'SA_AMF_PRESENCE_INSTANTIATED(3)', 
Event:'AVND_COMP_CLC_PRES_FSM_EV_TERM'
Apr 13 14:25:07.397776 osafamfnd [7387:clc.cc:1852] >> 
avnd_comp_clc_inst_term_hdler: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1': Terminate event in 
the Instantiated state
Apr 13 14:25:07.397797 osafamfnd [7387:clc.cc:1876] NO Terminate comann 
executing
Apr 13 14:25:07.397807 osafamfnd [7387:clc.cc:2757] >> 
avnd_comp_clc_cmd_execute: 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1':CLC CLI command 
type:'AVND_COMP_CLC_CMD_TYPE_TERMINATE(2)'
Apr 13 14:25:07.397824 osafamfnd [7387:clc.cc:2868] T1 Component is NPI, 1
Apr 13 14:25:07.397836 osafamfnd [7387:clc.cc:2920] T1 CLC CLI 
script:'/opt/amf_demo/npi/pm/amf_comp_npi_term'

Apr 13 14:25:07.398409 osafamfnd [7387:comp.cc:2774] IN 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=Am