from:"Praveen"

[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.

2017-09-12 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **assigned_to**: Praveen -->  nobody 
- **Comment**:

develop:
commit 126c7d9c59a41205ce16c2c9e8a7cae7457a0c2c
Author: Praveen <praveen.malv...@oracle.com>
Date:   Tue Sep 12 17:08:11 2017 +0530

amfnd: fix opensaf shutdown and active monitoring failure [#2493]

commit 74476b88a30c80c788e56b6ede2baea040e22c18
Author: Praveen <praveen.malv...@oracle.com>
Date:   Tue Sep 12 17:08:11 2017 +0530

amfnd: fix opensaf shutdown and active monitoring failure [#2493]




---

** [tickets:#2493] amf: amfnd asserts while shutting down when active 
monitoring fails for NPI comp.**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen
**Last Updated:** Wed Aug 30, 2017 10:39 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml)
 (12.0 kB; text/xml)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd)
 (6.1 MB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) 
(275.6 kB; application/octet-stream)


steps to reproduce:
1)Bring one controller up.
2)Add attached configuration in the system.
3)Unlock-in and unlock su1.

Attached configuration uses amfpm command to start active monitoring. If this 
command is wrongly configured by the user, AMF reports fault on the component 
and AMFND restarts it. Since everytime active monitoring command fails, 
component is getting continuously faulted. As a last option when OpenSAF is 
stopped on the node, AMFND asserted:

syslog:
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 
'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF 
components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation 
timer expired
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => 
TERMINATING
Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: 
avnd_su_pres_st_chng_prc: Assertion 'si' failed.
Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate 
this process


bt:
\#0  0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7f662fbec0d8 in __GI_abort () at abort.c:89
\#2  0x7f66306dedbe in __osafassert_fail (__file=, 
__line=, __func=,
__assertion=) at src/base/sysf_def.c:286
\#3  0x7f66313fff3f in avnd_su_pres_st_chng_prc 
(final_st=SA_AMF_PRESENCE_TERMINATING,
prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/susm.cc:1886
\#4  avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, 
su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0,
ev=) at src/amf/amfnd/susm.cc:1610
\#5  0x7f66313caf58 in avnd_comp_clc_st_chng_prc 
(cb=cb@entry=0x7f663161f240 <_avnd_cb>,
comp=comp@entry=0x7f66324d46b0, 
prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at 
src/amf/amfnd/clc.cc:1501
\#6  0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, 
comp=comp@entry=0x7f66324d46b0,
ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892
\#7  0x7f66314067e8 in avnd_comp_cleanup_launch 
(comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178
\#8  0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/term.cc:76
\#9  0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 
<_avnd_cb>, mid=)
at src/amf/amfnd/di.cc:1264
\#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, 
evt=0x7f6628001010)
at src/amf/amfnd/di.cc:411
\#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at 
src/amf/amfnd/main.cc:658
\#12 avnd_main_process () at src/amf/amfnd/main.cc:610
\#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at 
src/amf/amfnd/main.cc:203




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.

2017-08-30 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **Comment**:

commit c76c419a0250ac61e0d48180950aaafb639f32bf
Author: Praveen <praveen.malv...@oracle.com>
Date:   Thu Aug 31 10:56:36 2017 +0530

amfd: honor PrefAssignedSU in nway and nway active model during assignments 
[#2269]

SG attribute saAmfSGNumPrefAssignedSUs is applicable to N-Way and N-Way 
Active model.
AMF is assigning more than saAmfSGNumPrefAssignedSUs in both N-Way and 
N-Way Active model.

Patch fixes this problem.




---

** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way 
Active model.**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen
**Last Updated:** Fri Jul 28, 2017 08:25 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in 
N-Way Active model.
Issue can be reproduced by brining up the attached configurration.
In the application saAmfSGNumPrefAssignedSUs is set to 2:
 immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass
saAmfSGNumPrefAssignedSUs  SA_UINT32_T  2 (0x2)

But AMF is giving assignmets to all the three SUs:
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)

Since this attribute is valid for N-Way model also, issue is applicable to 
N-Way model also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.

2017-08-29 Thread Praveen via Opensaf-tickets

- **status**: assigned --> accepted
- **Blocker**: False --> True



---

** [tickets:#2493] amf: amfnd asserts while shutting down when active 
monitoring fails for NPI comp.**

**Status:** accepted
**Milestone:** 5.17.10
**Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen
**Last Updated:** Fri Jul 28, 2017 08:23 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml)
 (12.0 kB; text/xml)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd)
 (6.1 MB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) 
(275.6 kB; application/octet-stream)


steps to reproduce:
1)Bring one controller up.
2)Add attached configuration in the system.
3)Unlock-in and unlock su1.

Attached configuration uses amfpm command to start active monitoring. If this 
command is wrongly configured by the user, AMF reports fault on the component 
and AMFND restarts it. Since everytime active monitoring command fails, 
component is getting continuously faulted. As a last option when OpenSAF is 
stopped on the node, AMFND asserted:

syslog:
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 
'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF 
components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation 
timer expired
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => 
TERMINATING
Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: 
avnd_su_pres_st_chng_prc: Assertion 'si' failed.
Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate 
this process


bt:
\#0  0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7f662fbec0d8 in __GI_abort () at abort.c:89
\#2  0x7f66306dedbe in __osafassert_fail (__file=, 
__line=, __func=,
__assertion=) at src/base/sysf_def.c:286
\#3  0x7f66313fff3f in avnd_su_pres_st_chng_prc 
(final_st=SA_AMF_PRESENCE_TERMINATING,
prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/susm.cc:1886
\#4  avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, 
su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0,
ev=) at src/amf/amfnd/susm.cc:1610
\#5  0x7f66313caf58 in avnd_comp_clc_st_chng_prc 
(cb=cb@entry=0x7f663161f240 <_avnd_cb>,
comp=comp@entry=0x7f66324d46b0, 
prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at 
src/amf/amfnd/clc.cc:1501
\#6  0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, 
comp=comp@entry=0x7f66324d46b0,
ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892
\#7  0x7f66314067e8 in avnd_comp_cleanup_launch 
(comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178
\#8  0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/term.cc:76
\#9  0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 
<_avnd_cb>, mid=)
at src/amf/amfnd/di.cc:1264
\#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, 
evt=0x7f6628001010)
at src/amf/amfnd/di.cc:411
\#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at 
src/amf/amfnd/main.cc:658
\#12 avnd_main_process () at src/amf/amfnd/main.cc:610
\#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at 
src/amf/amfnd/main.cc:203




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.

2017-08-28 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **Comment**:

commit 00c185144de728f7938f775fd3ce65ee95b01032
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Aug 28 14:32:32 2017 +0530

amf: update readme for SC status change callback [#2475]

commit b93cf244b3fb64bc213d82125e1665b50b80f2c6
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Aug 28 14:32:33 2017 +0530

amf: support SC status change callback, non SAF [#2475]

commit 81e2878c1fa3287e37238a38a1bb054951489e86
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Aug 28 14:32:33 2017 +0530

amf: add sample apps for SC status change callback [#2475]

commit a79bb4c527ec3c59a61ce6552184c18213fe4acd
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Aug 28 14:32:33 2017 +0530

amf: add api test cases for sc status change callback [#2475]




---

** [tickets:#2475] amf: support for SC status change Callback, non SAF.**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen
**Last Updated:** Mon Aug 14, 2017 08:27 AM UTC
**Owner:** Praveen


This enhancement is for supporting two resources in AMFA which will enable 
application to know about
SCs Absence and Presence state when they go down and comes up.

Information about the resources:
* A callback that will be invoked by AMFA whenever a SC joins cluster and
  both SCs leaves cluster if SC Absence feature is enabled.

  -Callback and its argument:

  void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state)
  where OsafAmfSCStatusT is defined as:
typedef enum {
  OSAF_AMF_SC_PRESENT = 1,
  OSAF_AMF_SC_ABSENT = 2,
} OsafAmfSCStatusT;

  This callback can be integrated
  with standard AMF component(even with legacy one also).

  -Return codes:
   SA_AIS_OK - The function returned successfully.
   SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as
corruption). The library cannot be used anymore.
   SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is 
corrupted,
   uninitialized, or has already been finalized.
   SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly (callback).

* An API to register/install above callback function:
   void osafAmfInstallSCStatusChangeCallback(SaAmfHandleT amfHandle,
 void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status));
   If 0 is passed as amfHandle, then callback will be invoked in the
   context of MDS thread. If a valid amfHandle is passed then callback
   will be invoked in the context of thread which is calling saAmfDispatch()
   with this handle.




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #466 Length of the objectnames is more by one for configuration object notifications

2017-08-23 Thread Praveen via Opensaf-tickets

- **status**: assigned --> unassigned
- **assigned_to**: Praveen -->  nobody 
- **Blocker**:  --> False



---

** [tickets:#466] Length of the objectnames is more by one for configuration 
object notifications**

**Status:** unassigned
**Milestone:** future
**Created:** Thu Jun 20, 2013 09:08 AM UTC by Sirisha Alla
**Last Updated:** Wed Jan 04, 2017 06:41 AM UTC
**Owner:** nobody


When ntfimcnd sends notifications for configuration object 
creation/modification/deletion, the length of the notifying object and the 
notification object is been shown wrongly. IMM callback gives the length of the 
notification object correctly.

Notification object length in the imm callback:
objectName->length: 37
objectName->value: 'attrName_testSA_registerSA_Node_37_69'

Object create/modify/delete notifications indicate the length of notification 
object is 38 and the length of notifying object is 15 for "safApp=OpenSaf".

This issue is reproducible.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up

2017-08-08 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **Comment**:

develop:
commit c523f2a1b2b887ac2c6a91e5ee12c028b243a729
Author: Praveen <praveen.malv...@oracle.com>
Date:   Tue Aug 8 15:12:19 2017 +0530

 amf: add option for controller status in amfclusterstatus [#2536]

release:
commit 8d93a58adfdf96f2420e89686e0026375211799f
Author: Praveen <praveen.malv...@oracle.com>
Date:   Tue Aug 8 15:12:19 2017 +0530

 amf: add option for controller status in amfclusterstatus [#2536]




---

** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell
**Last Updated:** Fri Aug 04, 2017 09:19 AM UTC
**Owner:** Praveen


The current amfclusterstatus command can be used to check if the SCs are up, 
but in order to interpret the result you must know the names of the SC node(s), 
and you must parse the console output from the command.

In order to make this command easier to use for this purpose, we could add an 
option, e.g. -s or --controller-status, which simply answers the question 
whether any SC is currently up. It could exit with exit code 0 if any SC is up, 
and with exit code 1 if no SC is up. We could also add a -q or --quiet option 
that suppresses all printouts form the command. This will make the command easy 
to use in a shell script.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up

2017-08-04 Thread Praveen via Opensaf-tickets

- **status**: accepted --> review



---

** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up**

**Status:** review
**Milestone:** 5.17.10
**Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell
**Last Updated:** Wed Aug 02, 2017 04:08 AM UTC
**Owner:** Praveen


The current amfclusterstatus command can be used to check if the SCs are up, 
but in order to interpret the result you must know the names of the SC node(s), 
and you must parse the console output from the command.

In order to make this command easier to use for this purpose, we could add an 
option, e.g. -s or --controller-status, which simply answers the question 
whether any SC is currently up. It could exit with exit code 0 if any SC is up, 
and with exit code 1 if no SC is up. We could also add a -q or --quiet option 
that suppresses all printouts form the command. This will make the command easy 
to use in a shell script.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2429 clm: support for a clm utility to perform tracking and for getting node info.

2017-08-02 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **Comment**:

commit 77346df31fa7061496b22f91611e120477e907b5
Author: Praveen <praveen.malv...@oracle.com>
Date:   Wed Aug 2 16:58:16 2017 +0530

clm: add clm tool for tracking and for getting node info [#2429]

Add a utility/application which enables user to:
-perform tracking using saClmClusterTrack_4().
-get node info by calling saClmClusterNodeGet_4().
-get node info asynchronously by calling saClmClusterNodeGetAsync().





---

** [tickets:#2429] clm: support for a clm utility to perform tracking and  for 
getting node info.**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Mon Apr 17, 2017 06:38 AM UTC by Praveen
**Last Updated:** Fri Jul 14, 2017 09:04 AM UTC
**Owner:** Praveen


Ticket #2394 implements tool commands for handling CLM objects and performing 
admin operation.

This ticket is to add a utility or application which enable user to:
\-perform tracking using saClmClusterTrack_4().
\-get node info by calling saClmClusterNodeGet_4().
\-get node info asynchronously by calling saClmClusterNodeGetAsync().




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up

2017-08-01 Thread Praveen via Opensaf-tickets

I will be publishing this patch by the end of this week. Ticket #2475 
implements a cllback in the same area and is in review state.


---

** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up**

**Status:** accepted
**Milestone:** 5.17.10
**Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell
**Last Updated:** Tue Aug 01, 2017 01:49 PM UTC
**Owner:** Praveen


The current amfclusterstatus command can be used to check if the SCs are up, 
but in order to interpret the result you must know the names of the SC node(s), 
and you must parse the console output from the command.

In order to make this command easier to use for this purpose, we could add an 
option, e.g. -s or --controller-status, which simply answers the question 
whether any SC is currently up. It could exit with exit code 0 if any SC is up, 
and with exit code 1 if no SC is up. We could also add a -q or --quiet option 
that suppresses all printouts form the command. This will make the command easy 
to use in a shell script.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2536 amf: Add amfclusterstatus option to check if SCs are up

2017-07-30 Thread Praveen via Opensaf-tickets

- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Part**: - --> tools



---

** [tickets:#2536] amf: Add amfclusterstatus option to check if SCs are up**

**Status:** accepted
**Milestone:** 5.17.10
**Created:** Fri Jul 28, 2017 06:54 AM UTC by Anders Widell
**Last Updated:** Fri Jul 28, 2017 06:54 AM UTC
**Owner:** Praveen


The current amfclusterstatus command can be used to check if the SCs are up, 
but in order to interpret the result you must know the names of the SC node(s), 
and you must parse the console output from the command.

In order to make this command easier to use for this purpose, we could add an 
option, e.g. -s or --controller-status, which simply answers the question 
whether any SC is currently up. It could exit with exit code 0 if any SC is up, 
and with exit code 1 if no SC is up. We could also add a -q or --quiet option 
that suppresses all printouts form the command. This will make the command easy 
to use in a shell script.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.

2017-07-26 Thread Praveen via Opensaf-tickets

- **status**: assigned --> review
- **Blocker**:  --> True
- **Milestone**: future --> 5.17.08



---

** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way 
Active model.**

**Status:** review
**Milestone:** 5.17.08
**Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen
**Last Updated:** Tue Mar 28, 2017 07:04 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in 
N-Way Active model.
Issue can be reproduced by brining up the attached configurration.
In the application saAmfSGNumPrefAssignedSUs is set to 2:
 immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass
saAmfSGNumPrefAssignedSUs  SA_UINT32_T  2 (0x2)

But AMF is giving assignmets to all the three SUs:
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)

Since this attribute is valid for N-Way model also, issue is applicable to 
N-Way model also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #70 AMF support for Container and contained components

2017-07-21 Thread Praveen via Opensaf-tickets

Created branch  ticket-70  and pushed some initial patches in repo:
git://git.code.sf.net/u/praveenmalviya/review.


---

** [tickets:#70] AMF support for Container and contained components**

**Status:** assigned
**Milestone:** future
**Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar
**Last Updated:** Wed Jul 19, 2017 08:31 AM UTC
**Owner:** Praveen


Migrated from http://devel.opensaf.org/ticket/1436:

Current implementation of AMF doesn't support Container and Contained 
components.

Concept of container and contained component was introduced in B.03.01 spec. 
Because of this support,
there were series of changes and new additions in different sections of the 
spec. Also a new chapeter 6 is fully dedicated to conatiner and contained 
components. 

What follows is summary of conatiner and contained components concept collected 
from different sections of the B.04.01 spec with reference to particular 
sections and page no.

**A)Section  3.1.2.1.1 page 45 talks about use case related to containter**
 ** and contained component concept:**
"
The concept of container and contained components allows the Availability 
Management
Framework to integrate components that are not executed directly by the
operating system, but rather in a controlled environment running on top of the 
operating
system. Widespread environments are runtime environments, virtual machines,
or component frameworks.
"
AMF directly manages life cycle of container component but not of containted 
component.
A container component cooperates with AMF for managing life cycle of contained 
component.
If a container comp1 manages life cycle of a contained comp2 then comp1 is 
termed
as assciated container component of comp2 and comp2 is termed as associated 
contained
component of comp1. If there is one more component say comp3 for which 
associated
container component is same comp1 then comp3 and comp2 are referred as
collocated contained components. (3.1.2.1.1 Container and Contained Components 
page 45)

**B)Configuration:**
-Container and contained components are local SA-aware components.
 (6.1.2 Component Category page 221)
-User can configure attribute "saAmfCtCompCategory" of class "SaAmfCompType" 
with
 following values to declare component of this CompType is a container or
 contained component (Section7.4.8 SaAmfCompCategoryT page 258):

 \ #define SA_AMF_COMP_CONTAINER 0x0010
  \#define SA_AMF_COMP_CONTAINED 0x0020

-A single container component acts as container component for many contained
 conponents.(6.2.2 Assignment of the Container CSI page 224).
-Container and its contained components must be hosted on same AMF Node.
-A containter component can be part of SG of only N-Way Acitve model.
 (6.1.6 Redundancy Models Page 222)
-A contained component can be part of SG of any redundancy model.
 (6.1.6 Redundancy Models Page 223)
-A SU cannot contain any other types/categories of components if a container 
component is
 present in it. ( 3.1.4 Service Unit page 52).
-A SU cannot contain both container components and contained components.
 (6.1.5 Container and Contained Components in Service Units and Service Groups 
page 221)
-A SU that contains a contained component can only contain collocated contained
components.(6.1.5 Container and Contained Components in Service Units and 
Service Groups page 221)
-SUs containing contained components and SUs containing container components
 must belong to different SGs.
(6.1.5 Container and Contained Components in Service Units and Service Groups 
page 222)
-Since a container component can be associated with many contained components 
and also there
 can be many container components in a SU, a user has to specify a CSI name in
 saAmfCompContainerCsi in Comp class to declare indirectly its associated 
container component.
 The component which will receive this container CSI will act as container 
component
 for this contained component on same node.(3.1.3 Component Service Instance 
page 51)
-A SI containing a container CSI cannot have any other CSI.
(6.1.5 Container and Contained Components in Service Units and Service Groups 
page 222)
-A container component can recieve multiple CSI assignments based on 
configuration. Among
 these CSIs, one or more can be for handling contained components and others 
can be
 for providing other services. (3.1.3 Component Service Instance page 51)
-If a SU contains contained components then they should have a common 
associated container
 component. This can be ensured by configuring same container CSI.( 3.1.4 
Service Unit page 52)
-Rank of container SI should be higher than the rank of contained SI.
 (3.6.1.4 Considerations when Configuring Redundancy page 121) and
 (6.1.5 Container and Contained Components in Service Units and Service Groups 
Page 222)
-There should not be any conflict while hosting container SUs and contained SUs
 on nodes and node group as both container and its associated contained 
components

[tickets] [opensaf:tickets] #2429 clm: support for a clm utility to perform tracking and for getting node info.

2017-07-14 Thread Praveen via Opensaf-tickets

- **summary**: clm: support for a clm utility to perform tracking and cluster 
status. --> clm: support for a clm utility to perform tracking and  for getting 
node info.
- Description has changed:

Diff:



--- old
+++ new
@@ -4,4 +4,4 @@
 \-perform tracking using saClmClusterTrack_4().
 \-get node info by calling saClmClusterNodeGet_4().
 \-get node info asynchronously by calling saClmClusterNodeGetAsync().
-\-to list nodes status when SCs are present and absent.
+



- **status**: accepted --> review
- **Part**: - --> tools
- **Blocker**:  --> False



---

** [tickets:#2429] clm: support for a clm utility to perform tracking and  for 
getting node info.**

**Status:** review
**Milestone:** 5.17.10
**Created:** Mon Apr 17, 2017 06:38 AM UTC by Praveen
**Last Updated:** Sat Jul 01, 2017 04:15 PM UTC
**Owner:** Praveen


Ticket #2394 implements tool commands for handling CLM objects and performing 
admin operation.

This ticket is to add a utility or application which enable user to:
\-perform tracking using saClmClusterTrack_4().
\-get node info by calling saClmClusterNodeGet_4().
\-get node info asynchronously by calling saClmClusterNodeGetAsync().




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-07-07 Thread Praveen via Opensaf-tickets

commit 8b79e5a7d45986f50195865f6ec276eede025ae4
Author: Praveen <praveen.malv...@oracle.com>
Date:   Thu May 18 17:19:17 2017 +0530

clmd: update saClmNodeCurrAddress and saClmNodeCurrAddressFamily in IMM V2 
[#2331]



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Fri Jul 07, 2017 10:51 AM UTC
**Owner:** Praveen


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-07-07 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Sat Jul 01, 2017 04:15 PM UTC
**Owner:** Praveen


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #70 AMF support for Container and contained components

2017-07-06 Thread Praveen via Opensaf-tickets

ns regular SA-AWARE 
component or
+  a contained omponent. It is different when a SU/Node/SG contains an "active" 
container
+  component and its associated contained components are up.
+  Basic flows of some admin operation in suc a case:
+  -Lock of container SU:
+  -Since contained SU may belong to any Redundancy model, first 
assignments are removed
+   from contained SU as if the lock operation is also issued on contained 
SU.
+  -After removal of assignments from contained SU, all comps are 
terminated in contained
+   SU.
+  -Now assignments will be removed from container SU gracefully via 
quiesced HA state.
+  -Lock of container SI:
+  -First remove assignments from all those contained SUs where this 
container SI is active.
+  -After removal of assignments from contained SUs, all comps are 
terminated in these
+   contained SU.
+  -Now assignments will be removed from container SU.
+  -Shutdown of container SI:
+  -First remove assignments from all those contained SUs via quiescing 
state where
+   this container SI is active.
+  -After removal of assignments from contained SUs, all comps are 
terminated in these
+   contained SU.
+  -Now assignments will be removed from container SU gracefully via 
quiescing HA state.
+  -Restart of container component:
+  Here it is assumed that saAmfCompDisableRestart is false.
+  -First terminate associated contained component using terminate Callback.
+  -Terminate container component using terminate callback.
+  -Instantiate container component with INSTANTIATE CLC-CLI script.
+  -Reassign container CSI active to container component.
+  -Now instantiate contained component by sending 
saAmfContainedComponentInstantiateCallback
+   to container component.
+  -After successful instantiation of contained component, it will 
reassigned.
+
+**H)Notifications (11 Alarms and Notifications Page 417):**
+  AMF generatea notifications for container and contained
+  components as it currently generates for any other component.
+  
+**I)Some important facts from different sections:**
+  -A process belonging to a container component can also belong to its 
associated
+   contained components.(3.1.2.1.1 Container and Contained Components page 45)
+  -A process belonging to a contained component belongs also to its associated
+   container component and may also belong to some of its collocated contained
+   components.(3.1.2.1.1 Container and Contained Components page 45)
+  -The container CSI can contain information to be passed by the associated 
container
+   component to the corresponding contained component. How this information is
+   passed is a private interface between container and contained components.
+   (3.1.3 Component Service Instance page 51)
+  -A container component can be configured to have multiple CSI assignments,
+   one or more for handling contained components, and others for providing
+   other services. In terms of functionality and syntax, there is no 
difference between a
+   container CSI used to determine the associated container component and 
other CSIs
+   corresponding to the workload of other services.
+   (3.1.3 Component Service Instance page 51)
+



- **Comment**:

Will attach same as a document.



---

** [tickets:#70] AMF support for Container and contained components**

**Status:** assigned
**Milestone:** future
**Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar
**Last Updated:** Wed Jun 28, 2017 05:28 AM UTC
**Owner:** Praveen


Migrated from http://devel.opensaf.org/ticket/1436:

Current implementation of AMF doesn't support Container and Contained 
components.

Concept of container and contained component was introduced in B.03.01 spec. 
Because of this support,
there were series of changes and new additions in different sections of the 
spec. Also a new chapeter 6 is fully dedicated to conatiner and contained 
components. 

What follows is summary of conatiner and contained components concept collected 
from different sections of the B.04.01 spec with reference to particular 
sections and page no.

**A)Section  3.1.2.1.1 page 45 talks about use case related to containter**
 ** and contained component concept:**
"
The concept of container and contained components allows the Availability 
Management
Framework to integrate components that are not executed directly by the
operating system, but rather in a controlled environment running on top of the 
operating
system. Widespread environments are runtime environments, virtual machines,
or component frameworks.
"
AMF directly manages life cycle of container component but not of containted 
component.
A container component cooperates with AMF for managing life cycle of contained 
component.
If a container comp1 manages life cycle of a contained comp2 then comp1 is 
termed
as assciated container component of comp2 and comp2 is termed as associated 
contained
comp

[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.

2017-07-02 Thread Praveen via Opensaf-tickets

Hi,

Those changes were pushed in the ticket #2416. It was pushed after 5.2 GA.
If there are some reproducbile steps then update this ticket.

Yes, uncommenting that line will enable AMFD traces.

Thanks
Praveen



---

** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI 
assignment counter during fail-over.**

**Status:** unassigned
**Milestone:** 5.17.08
**Created:** Thu May 25, 2017 08:46 AM UTC by Praveen
**Last Updated:** Sat Jul 01, 2017 04:17 PM UTC
**Owner:** nobody


Ticket is based on a issue reported via user list mail dated: 22-May-17, 
subject  "[users] osafamfd coredump issue.


Here is syslog when the issue occurred:

2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.5:bond0>, peer not responding

2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.5:bond0> on network plane A

2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5>

2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:287038266327043)

2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1050f pid:15395

2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 104 <0, 1050f(down)> (MsgQueueService66831)

2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left 
the cluster

2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state 
change notification from NTF, entity PLD0105 now has new state DISABLED (Oper 
state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed)

2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: 
dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.

2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.6:bond0>, peer not responding

2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.6:bond0> on network plane A

2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6>

2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:288139774320643)

2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1060f pid:17439

2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 106 <0, 1060f(down)> (MsgQueueService67087)

2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director 
unexpectedly crashed

2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId 
= 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 69647, SupervisionTime = 0

2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally 
disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; 
timeout=0

2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #70 AMF support for Container and contained components

2017-06-27 Thread Praveen via Opensaf-tickets

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d
- **Blocker**:  --> False



---

** [tickets:#70] AMF support for Container and contained components**

**Status:** assigned
**Milestone:** future
**Created:** Mon May 13, 2013 04:14 AM UTC by Nagendra Kumar
**Last Updated:** Mon Apr 03, 2017 06:47 PM UTC
**Owner:** Praveen


Migrated from http://devel.opensaf.org/ticket/1436:

Current implementation of AMF doesn't support Container and Contained 
components.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2506 ntf: ntfimcn does not handle SA_ERR_UNAVAILABLE

2017-06-23 Thread Praveen via Opensaf-tickets

Imm is integrated with CLM in 5.2 release. Since IMCN is pre-5.2 component, IMM 
API should not return ERR_UNAVAILABLE to IMCN.


---

** [tickets:#2506] ntf: ntfimcn does not handle SA_ERR_UNAVAILABLE**

**Status:** accepted
**Milestone:** 5.17.08
**Created:** Tue Jun 20, 2017 11:15 AM UTC by elunlen
**Last Updated:** Tue Jun 20, 2017 11:15 AM UTC
**Owner:** elunlen


The ntfimcn part of ntf will create an ER log and abort if 
saImmOmClassDescriptionGet_2 fail on SA_AIS_ERR_UNAVAILABLE (actually any OM 
operation that uses the om handle). This will happen if the node where ntfimcn 
is running leaves the CLM cluster and is connected again, the om handle will be 
invalid for the new cluster configuration (see AIS description of 
saImmOmClassDescriptionGet_2 return codes).

Note1: This is not a big problem since imcn will recover without any need of 
node restart. No other services including ntf will be affected. Also it can 
only happen on the standby node (the active node cannot leave the cluster. If 
that happen a failover will happen)

Two fixes should be done:
1. The OM handle should not have a long life-span insted it should be 
initialized every time it is needed. This will significantly reduce the risk of 
the handle to become invalid.
2. The ntfincn process should not be started on the standby node since it is 
not needed (HA handling was never implemented/needed)

Note2: The problem was detected because CLM tests (see apitests) are not 
supposed to be executed on SC nodes. The detection of this is not working for 
this test. See [#2505]



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover

2017-06-21 Thread Praveen via Opensaf-tickets

- **status**: review --> fixed
- **Comment**:

5.17.08:
commit 829519a4f3a86eb836a55be8301fd5d2befeeec3
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Jun 19 12:39:31 2017 +0530

amfd: maintain node attributes in imm job queue at standby [#2494]

5.17.06:
commit abaef2fda56bc1fe689d0ea1d0142568e25a2830
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Jun 19 12:39:31 2017 +0530

amfd: maintain node attributes in imm job queue at standby [#2494]




---

** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC 
failover**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jun 19, 2017 09:28 AM UTC
**Owner:** Praveen


The problem appears when application performs a node admin operation (for 
instance lock-in node) and SC failover is triggered at the same time. The 
persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since 
the active node is going down. At the standby side, the admin node state is 
checkpoint-ed, but it is also not updated to IMM either

outlined trace:
in SC-1:
~~~
Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> 
node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState 
LOCKED => LOCKED_INSTANTIATION
Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> 
saLogWriteLogAsync 
Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> 
handle_log_record 
Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << 
handle_log_record 
Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> 
lga_mds_msg_async_send 
Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> 
lga_mds_enc 
Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 
msgtype: 0
Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 
api_info.type: 4
Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << 
lga_mds_enc 
Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << 
lga_mds_msg_async_send 
Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << 
saLogWriteLogAsync 
Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> 
avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' 
saAmfNodeAdminState
Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << 
avd_saImmOiRtObjectUpdate 
~~~
...
~~~
Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: 
Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState
Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> 
object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster
Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << 
object_name_to_class_type: 19
Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> 
rt_object_update_common 
Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA 
Rcvd MDS subscribe evt from svc 34 
Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS 
no active
Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716864 osafam

[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover

2017-06-19 Thread Praveen via Opensaf-tickets

- **status**: accepted --> review
- **Blocker**: True --> False
- **Comment**:

In the issue, only update to IMM got missed but checkpoiting to standby AMFD 
was successful.  There is separate enhancement for the case when both 
checkpoiting and IMM update gets missed.



---

** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC 
failover**

**Status:** review
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jun 19, 2017 06:12 AM UTC
**Owner:** Praveen


The problem appears when application performs a node admin operation (for 
instance lock-in node) and SC failover is triggered at the same time. The 
persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since 
the active node is going down. At the standby side, the admin node state is 
checkpoint-ed, but it is also not updated to IMM either

outlined trace:
in SC-1:
~~~
Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> 
node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState 
LOCKED => LOCKED_INSTANTIATION
Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> 
saLogWriteLogAsync 
Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> 
handle_log_record 
Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << 
handle_log_record 
Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> 
lga_mds_msg_async_send 
Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> 
lga_mds_enc 
Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 
msgtype: 0
Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 
api_info.type: 4
Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << 
lga_mds_enc 
Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << 
lga_mds_msg_async_send 
Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << 
saLogWriteLogAsync 
Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> 
avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' 
saAmfNodeAdminState
Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << 
avd_saImmOiRtObjectUpdate 
~~~
...
~~~
Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: 
Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState
Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> 
object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster
Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << 
object_name_to_class_type: 19
Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> 
rt_object_update_common 
Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA 
Rcvd MDS subscribe evt from svc 34 
Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS 
no active
Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> 
lga_mds_svc_evt 
Jun 12 20:50:50.716875 osafamfd [2

[tickets] [opensaf:tickets] #2469 amf: Stop tracking api returns NOT_EXIST

2017-06-19 Thread Praveen via Opensaf-tickets

- **summary**: clm: Stop tracking api returns NOT_EXIST --> amf: Stop tracking 
api returns NOT_EXIST
- **status**: assigned --> unassigned
- **Component**: clm --> amf
- **Blocker**: True --> False



---

** [tickets:#2469] amf: Stop tracking api returns NOT_EXIST**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau
**Last Updated:** Fri Jun 16, 2017 08:39 AM UTC
**Owner:** Praveen


When performing switchover, AMFD fails to stop CLM track callback with error 
code 12 (NOT_EXIST)

**syslog:
**
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced
2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 
(@OpenSafImmReplicatorB) <343, 2010f>
2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started
2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 
2010f> (safAmfService)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 
(@safAmfService2010f) <27, 2010f>
2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 
2020f> (@safAmfService2020f)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 
(safAmfService) <0, 2020f>
2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 
after switch over
2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done

**CLM trace:
**
May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> 
proc_track_stop_msg 
May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> 
clms_node_get_by_id 
May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node 
found 131343
May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << 
clms_node_get_by_id 
May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node 
id = 131343
May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> 
clms_mds_msg_send 
May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << 
clms_mds_msg_send 
May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> 
clms_send_async_update 
May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> 
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, 
as per the send-type specified
May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR 
svc_id:48, pwe_hdl:65552
May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> 
mbcsv_send_ckpt_data_to_all_peers 
May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC 
update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552
May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling 
encode callback
May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> 
mbcsv_callback 
May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> 
ckpt_encode_cbk_handler 
May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR 
cbk_arg->info.encode.io_msg_type type 1
May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> 
ckpt_encode_async_update 
May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR 
data->header.type 3
May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR 
Async update CLMS_CKPT_TRACK_START
May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << 
ckpt_encode_async_update 
May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << 
ckpt_encode_cbk_handler 
May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << 
mbcsv_callback 
May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the 
encoded message to any other peer with same s/w version
May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26 10:19:13.173675 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC 
update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552
May 26 10:19:13.173680 osafclmd [240:240

[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover

2017-06-16 Thread Praveen via Opensaf-tickets

- **status**: assigned --> accepted



---

** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC 
failover**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau
**Last Updated:** Thu Jun 15, 2017 10:06 AM UTC
**Owner:** Praveen


The problem appears when application performs a node admin operation (for 
instance lock-in node) and SC failover is triggered at the same time. The 
persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since 
the active node is going down. At the standby side, the admin node state is 
checkpoint-ed, but it is also not updated to IMM either

outlined trace:
in SC-1:
~~~
Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> 
node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState 
LOCKED => LOCKED_INSTANTIATION
Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> 
saLogWriteLogAsync 
Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> 
handle_log_record 
Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << 
handle_log_record 
Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> 
lga_mds_msg_async_send 
Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> 
lga_mds_enc 
Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 
msgtype: 0
Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 
api_info.type: 4
Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << 
lga_mds_enc 
Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << 
lga_mds_msg_async_send 
Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << 
saLogWriteLogAsync 
Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> 
avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' 
saAmfNodeAdminState
Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << 
avd_saImmOiRtObjectUpdate 
~~~
...
~~~
Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: 
Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState
Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> 
object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster
Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << 
object_name_to_class_type: 19
Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> 
rt_object_update_common 
Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA 
Rcvd MDS subscribe evt from svc 34 
Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS 
no active
Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> 
lga_mds_svc_evt 
Jun 12 20:50:50.716875 osafamfd [268:271:src/log/agent/lga_mds.c:0678] TR 
lga_mds_svc_evtNCSMDS_NO_ACTIVE
Jun 12 20:50:50.716879 osafamfd [268:271:src/log/agent/lga_mds.c:0683] TR 
NCSMDS_NO_ACTIVE
Jun 12 20:50:50.716881 osafamfd [268:271:src/log/agent/

[tickets] [opensaf:tickets] #2469 clm: Stop tracking api returns NOT_EXIST

2017-06-16 Thread Praveen via Opensaf-tickets

Hi Minh,

Generally for ERR_TIMEOUT case it is recommended to finalize that handle 
because if the API was called for creating some resource then user does not 
know whether resource was created or not.
Since present case is for stoping the the tracking, it may work.

Thanks
Praveen


---

** [tickets:#2469] clm: Stop tracking api returns NOT_EXIST**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau
**Last Updated:** Wed Jun 14, 2017 02:05 AM UTC
**Owner:** Praveen


When performing switchover, AMFD fails to stop CLM track callback with error 
code 12 (NOT_EXIST)

**syslog:
**
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced
2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 
(@OpenSafImmReplicatorB) <343, 2010f>
2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started
2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 
2010f> (safAmfService)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 
(@safAmfService2010f) <27, 2010f>
2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 
2020f> (@safAmfService2020f)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 
(safAmfService) <0, 2020f>
2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 
after switch over
2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done

**CLM trace:
**
May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> 
proc_track_stop_msg 
May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> 
clms_node_get_by_id 
May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node 
found 131343
May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << 
clms_node_get_by_id 
May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node 
id = 131343
May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> 
clms_mds_msg_send 
May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << 
clms_mds_msg_send 
May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> 
clms_send_async_update 
May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> 
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, 
as per the send-type specified
May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR 
svc_id:48, pwe_hdl:65552
May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> 
mbcsv_send_ckpt_data_to_all_peers 
May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC 
update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552
May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling 
encode callback
May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> 
mbcsv_callback 
May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> 
ckpt_encode_cbk_handler 
May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR 
cbk_arg->info.encode.io_msg_type type 1
May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> 
ckpt_encode_async_update 
May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR 
data->header.type 3
May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR 
Async update CLMS_CKPT_TRACK_START
May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << 
ckpt_encode_async_update 
May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << 
ckpt_encode_cbk_handler 
May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << 
mbcsv_callback 
May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the 
encoded message to any other peer with same s/w version
May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26 10:19:13.173675 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC 
update to be sent. role: 1, svc_id

[tickets] [opensaf:tickets] #2498 amfd: incorrect saAmfSGNumPrefAssignedSUs

2017-06-16 Thread Praveen via Opensaf-tickets

I had published a patch for this in ticket :
\#2269:amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.
I will republish it after rebasing.


---

** [tickets:#2498] amfd: incorrect saAmfSGNumPrefAssignedSUs**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Fri Jun 16, 2017 04:42 AM UTC by Gary Lee
**Last Updated:** Fri Jun 16, 2017 04:44 AM UTC
**Owner:** Gary Lee


If saAmfSGNumPrefAssignedSUs is not set, AMFD should refer to 
saAmfSGNumPrefInserviceSUs. This currently works, except for the case where 
saAmfSGNumPrefInserviceSUs is changed after startup. saAmfSGNumPrefAssignedSUs 
does not get updated.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2494 amfd: AmfNodeAdminState is not updated to IMM while SC failover

2017-06-15 Thread Praveen via Opensaf-tickets

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Blocker**: False --> True



---

** [tickets:#2494] amfd: AmfNodeAdminState is not updated to IMM while SC 
failover**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 08:33 AM UTC by Minh Hon Chau
**Last Updated:** Tue Jun 13, 2017 08:33 AM UTC
**Owner:** Praveen


The problem appears when application performs a node admin operation (for 
instance lock-in node) and SC failover is triggered at the same time. The 
persistent RTA saAmfNodeAdmin state is not updated to IMM on active SC since 
the active node is going down. At the standby side, the admin node state is 
checkpoint-ed, but it is also not updated to IMM either

outlined trace:
in SC-1:
~~~
Jun 12 20:50:50.499054 osafamfd [268:268:src/amf/amfd/node.cc:0942] >> 
node_admin_state_set: safAmfNode=PL-5,safAmfCluster=myAmfCluster AdmState 
LOCKED => LOCKED_INSTANTIATION
Jun 12 20:50:50.499058 osafamfd [268:268:src/log/agent/lga_api.c:1225] >> 
saLogWriteLogAsync 
Jun 12 20:50:50.499061 osafamfd [268:268:src/log/agent/lga_api.c:1087] >> 
handle_log_record 
Jun 12 20:50:50.499064 osafamfd [268:268:src/log/agent/lga_api.c:1181] << 
handle_log_record 
Jun 12 20:50:50.499068 osafamfd [268:268:src/log/agent/lga_mds.c:1469] >> 
lga_mds_msg_async_send 
Jun 12 20:50:50.499075 osafamfd [268:268:src/log/agent/lga_mds.c:0792] >> 
lga_mds_enc 
Jun 12 20:50:50.499079 osafamfd [268:268:src/log/agent/lga_mds.c:0824] T2 
msgtype: 0
Jun 12 20:50:50.499082 osafamfd [268:268:src/log/agent/lga_mds.c:0837] T2 
api_info.type: 4
Jun 12 20:50:50.499085 osafamfd [268:268:src/log/agent/lga_mds.c:0865] << 
lga_mds_enc 
Jun 12 20:50:50.499173 osafamfd [268:268:src/log/agent/lga_mds.c:1492] << 
lga_mds_msg_async_send 
Jun 12 20:50:50.499181 osafamfd [268:268:src/log/agent/lga_api.c:1404] << 
saLogWriteLogAsync 
Jun 12 20:50:50.499185 osafamfd [268:268:src/amf/amfd/imm.cc:1843] >> 
avd_saImmOiRtObjectUpdate: 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' 
saAmfNodeAdminState
Jun 12 20:50:50.499191 osafamfd [268:268:src/amf/amfd/imm.cc:1873] << 
avd_saImmOiRtObjectUpdate 
~~~
...
~~~
Jun 12 20:50:50.500294 osafamfd [268:268:src/amf/amfd/imm.cc:0240] >> exec: 
Update 'safAmfNode=PL-5,safAmfCluster=myAmfCluster' saAmfNodeAdminState
Jun 12 20:50:50.500298 osafamfd [268:268:src/amf/amfd/imm.cc:0722] >> 
object_name_to_class_type: safAmfNode=PL-5,safAmfCluster=myAmfCluster
Jun 12 20:50:50.500302 osafamfd [268:268:src/amf/amfd/imm.cc:0770] << 
object_name_to_class_type: 19
Jun 12 20:50:50.500306 osafamfd [268:268:src/imm/agent/imma_oi_api.cc:2546] >> 
rt_object_update_common 
Jun 12 20:50:50.635362 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635402 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.635409 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635414 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635420 osafamfd [268:271:src/clm/agent/clma_mds.c:0968] T2 CLMA 
Rcvd MDS subscribe evt from svc 34 
Jun 12 20:50:50.635423 osafamfd [268:271:src/clm/agent/clma_mds.c:0989] TR CLMS 
no active
Jun 12 20:50:50.635439 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.635444 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.648993 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690140 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690168 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.690195 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.690201 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716805 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716849 osafamfd [268:271:src/mbc/mbcsv_mds.c:0439] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Jun 12 20:50:50.716857 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716864 osafamfd [268:271:src/mds/mds_dt_trans.c:0755] >> 
mdtm_process_poll_recv_data_tcp 
Jun 12 20:50:50.716871 osafamfd [268:271:src/log/agent/lga_mds.c:0674] >> 
lga_mds_svc_evt 
Jun 12 20:50:50.716875 osafamfd [268:271:src/log/agent/lga_mds.c:0678] TR 
lga_mds_svc_evtNCSMDS_NO_ACTIVE
Jun 12 20:50:50.716879 osafamfd [268:271:src/log/agent/lga_mds.c:0683] TR 
NCSMDS_NO_ACTIVE
Ju

[tickets] [opensaf:tickets] #2496 amf: amfd crashes while trying to free invalid memory.

2017-06-14 Thread Praveen via Opensaf-tickets

- **status**: accepted --> review



---

** [tickets:#2496] amf: amfd crashes while trying to free invalid memory.**

**Status:** review
**Milestone:** 5.17.06
**Created:** Wed Jun 14, 2017 08:49 AM UTC by Praveen
**Last Updated:** Wed Jun 14, 2017 08:49 AM UTC
**Owner:** Praveen


Steps to reproduce:
1)Bring AMF demo up on one controller.
2)Issue lock operation on active SU.
3)When component is still processing quiesced assignment, run below command:
immlist safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
4)AMF wil crash for updating runtime atributes of SU in su_rt_attr_cb().

bt:
\#0  0x7fac6971fcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7fac697230d8 in __GI_abort () at abort.c:89
\#2  0x7fac6975c394 in __libc_message (do_abort=do_abort@entry=1,
fmt=fmt@entry=0x7fac6986ab28 "*** Error in `%s': %s: 0x%s ***\n") at 
../sysdeps/posix/libc_fatal.c:175
\#3  0x7fac6976866e in malloc_printerr (ptr=,
str=0x7fac6986acc8 "free(): invalid next size (fast)", action=1) at 
malloc.c:4996
\#4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840
\#5  0x7fac6b4c471a in su_rt_attr_cb (immOiHandle=, 
objectName=,
attributeNames=) at src/amf/amfd/su.cc:1501
\#6  0x7fac6b4531f1 in rt_attr_update_cb (immoi_handle=94489411855, 
object_name=0x7fac640041b8,
attribute_names=0x7fac6c104290) at src/amf/amfd/imm.cc:881
\#7  0x7fac6a99bc42 in imma_process_callback_info 
(cb=cb@entry=0x7fac6aba6320 , cl_node=0x7fac6c0cf250,
callback=callback@entry=0x7fac64004190, immHandle=94489411855) at 
src/imm/agent/imma_proc.cc:3266
\#8  0x7fac6a99bf79 in imma_hdl_callbk_dispatch_all (cb=0x7fac6aba6320 
, immHandle=94489411855)
at src/imm/agent/imma_proc.cc:1812
\#9  0x7fac6a99301d in saImmOiDispatch (immOiHandle=94489411855, 
dispatchFlags=SA_DISPATCH_ALL)
at src/imm/agent/imma_oi_api.cc:642
\#10 0x7fac6b412868 in main_loop () at src/amf/amfd/main.cc:717
\#11 main (argc=, argv=) at 
src/amf/amfd/main.cc:848



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2496 amf: amfd crashes while trying to free invalid memory.

2017-06-14 Thread Praveen via Opensaf-tickets




---

** [tickets:#2496] amf: amfd crashes while trying to free invalid memory.**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Wed Jun 14, 2017 08:49 AM UTC by Praveen
**Last Updated:** Wed Jun 14, 2017 08:49 AM UTC
**Owner:** Praveen


Steps to reproduce:
1)Bring AMF demo up on one controller.
2)Issue lock operation on active SU.
3)When component is still processing quiesced assignment, run below command:
immlist safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
4)AMF wil crash for updating runtime atributes of SU in su_rt_attr_cb().

bt:
\#0  0x7fac6971fcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7fac697230d8 in __GI_abort () at abort.c:89
\#2  0x7fac6975c394 in __libc_message (do_abort=do_abort@entry=1,
fmt=fmt@entry=0x7fac6986ab28 "*** Error in `%s': %s: 0x%s ***\n") at 
../sysdeps/posix/libc_fatal.c:175
\#3  0x7fac6976866e in malloc_printerr (ptr=,
str=0x7fac6986acc8 "free(): invalid next size (fast)", action=1) at 
malloc.c:4996
\#4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840
\#5  0x7fac6b4c471a in su_rt_attr_cb (immOiHandle=, 
objectName=,
attributeNames=) at src/amf/amfd/su.cc:1501
\#6  0x7fac6b4531f1 in rt_attr_update_cb (immoi_handle=94489411855, 
object_name=0x7fac640041b8,
attribute_names=0x7fac6c104290) at src/amf/amfd/imm.cc:881
\#7  0x7fac6a99bc42 in imma_process_callback_info 
(cb=cb@entry=0x7fac6aba6320 , cl_node=0x7fac6c0cf250,
callback=callback@entry=0x7fac64004190, immHandle=94489411855) at 
src/imm/agent/imma_proc.cc:3266
\#8  0x7fac6a99bf79 in imma_hdl_callbk_dispatch_all (cb=0x7fac6aba6320 
, immHandle=94489411855)
at src/imm/agent/imma_proc.cc:1812
\#9  0x7fac6a99301d in saImmOiDispatch (immOiHandle=94489411855, 
dispatchFlags=SA_DISPATCH_ALL)
at src/imm/agent/imma_oi_api.cc:642
\#10 0x7fac6b412868 in main_loop () at src/amf/amfd/main.cc:717
\#11 main (argc=, argv=) at 
src/amf/amfd/main.cc:848



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.

2017-06-13 Thread Praveen via Opensaf-tickets

- Description has changed:

Diff:



--- old
+++ new
@@ -1,4 +1,3 @@
-
 steps to reproduce:
 1)Bring one controller up.
 2)Add attached configuration in the system.



- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2493] amf: amfnd asserts while shutting down when active 
monitoring fails for NPI comp.**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen
**Last Updated:** Tue Jun 13, 2017 07:11 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml)
 (12.0 kB; text/xml)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd)
 (6.1 MB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) 
(275.6 kB; application/octet-stream)


steps to reproduce:
1)Bring one controller up.
2)Add attached configuration in the system.
3)Unlock-in and unlock su1.

Attached configuration uses amfpm command to start active monitoring. If this 
command is wrongly configured by the user, AMF reports fault on the component 
and AMFND restarts it. Since everytime active monitoring command fails, 
component is getting continuously faulted. As a last option when OpenSAF is 
stopped on the node, AMFND asserted:

syslog:
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 
'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF 
components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation 
timer expired
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => 
TERMINATING
Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: 
avnd_su_pres_st_chng_prc: Assertion 'si' failed.
Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate 
this process


bt:
\#0  0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7f662fbec0d8 in __GI_abort () at abort.c:89
\#2  0x7f66306dedbe in __osafassert_fail (__file=, 
__line=, __func=,
__assertion=) at src/base/sysf_def.c:286
\#3  0x7f66313fff3f in avnd_su_pres_st_chng_prc 
(final_st=SA_AMF_PRESENCE_TERMINATING,
prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/susm.cc:1886
\#4  avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, 
su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0,
ev=) at src/amf/amfnd/susm.cc:1610
\#5  0x7f66313caf58 in avnd_comp_clc_st_chng_prc 
(cb=cb@entry=0x7f663161f240 <_avnd_cb>,
comp=comp@entry=0x7f66324d46b0, 
prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at 
src/amf/amfnd/clc.cc:1501
\#6  0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, 
comp=comp@entry=0x7f66324d46b0,
ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892
\#7  0x7f66314067e8 in avnd_comp_cleanup_launch 
(comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178
\#8  0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/term.cc:76
\#9  0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 
<_avnd_cb>, mid=)
at src/amf/amfnd/di.cc:1264
\#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, 
evt=0x7f6628001010)
at src/amf/amfnd/di.cc:411
\#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at 
src/amf/amfnd/main.cc:658
\#12 avnd_main_process () at src/amf/amfnd/main.cc:610
\#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at 
src/amf/amfnd/main.cc:203




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2493 amf: amfnd asserts while shutting down when active monitoring fails for NPI comp.

2017-06-13 Thread Praveen via Opensaf-tickets




---

** [tickets:#2493] amf: amfnd asserts while shutting down when active 
monitoring fails for NPI comp.**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Tue Jun 13, 2017 07:11 AM UTC by Praveen
**Last Updated:** Tue Jun 13, 2017 07:11 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1945_npi.xml](https://sourceforge.net/p/opensaf/tickets/2493/attachment/1945_npi.xml)
 (12.0 kB; text/xml)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2493/attachment/osafamfnd)
 (6.1 MB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/2493/attachment/syslog) 
(275.6 kB; application/octet-stream)



steps to reproduce:
1)Bring one controller up.
2)Add attached configuration in the system.
3)Unlock-in and unlock su1.

Attached configuration uses amfpm command to start active monitoring. If this 
command is wrongly configured by the user, AMF reports fault on the component 
and AMFND restarts it. Since everytime active monitoring command fails, 
component is getting continuously faulted. As a last option when OpenSAF is 
stopped on the node, AMFND asserted:

syslog:
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 
'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF 
components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation 
timer expired
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => 
TERMINATING
Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: 
avnd_su_pres_st_chng_prc: Assertion 'si' failed.
Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate 
this process


bt:
\#0  0x7f662fbe8cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
\#1  0x7f662fbec0d8 in __GI_abort () at abort.c:89
\#2  0x7f66306dedbe in __osafassert_fail (__file=, 
__line=, __func=,
__assertion=) at src/base/sysf_def.c:286
\#3  0x7f66313fff3f in avnd_su_pres_st_chng_prc 
(final_st=SA_AMF_PRESENCE_TERMINATING,
prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/susm.cc:1886
\#4  avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, 
su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0,
ev=) at src/amf/amfnd/susm.cc:1610
\#5  0x7f66313caf58 in avnd_comp_clc_st_chng_prc 
(cb=cb@entry=0x7f663161f240 <_avnd_cb>,
comp=comp@entry=0x7f66324d46b0, 
prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at 
src/amf/amfnd/clc.cc:1501
\#6  0x7f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, 
comp=comp@entry=0x7f66324d46b0,
ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892
\#7  0x7f66314067e8 in avnd_comp_cleanup_launch 
(comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178
\#8  0x7f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 
<_avnd_cb>) at src/amf/amfnd/term.cc:76
\#9  0x7f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 
<_avnd_cb>, mid=)
at src/amf/amfnd/di.cc:1264
\#10 0x7f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, 
evt=0x7f6628001010)
at src/amf/amfnd/di.cc:411
\#11 0x7f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at 
src/amf/amfnd/main.cc:658
\#12 avnd_main_process () at src/amf/amfnd/main.cc:610
\#13 0x7f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at 
src/amf/amfnd/main.cc:203




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-09 Thread Praveen via Opensaf-tickets

- **status**: accepted --> unassigned
- **assigned_to**: Praveen -->  nobody 



---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Fri Jun 09, 2017 09:04 AM UTC
**Owner:** nobody


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-09 Thread Praveen via Opensaf-tickets

Attached is the configuration to reproduce the issue.
Steps to reproduce:
1)Bring SC-1 with attached imm.xml
2)After cluster time expiry, AMF will start two NPI components as part of 
assignment.
3) Make some delay between instantiation of two NPI components script.
4) Restart the already instantiated NPI component such that its restart will 
complete after instantiation of second NPI component.


Attachments:

- 
[imm.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/62c0/attachment/imm.xml)
 (328.6 kB; text/xml)


---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Fri Jun 09, 2017 01:07 AM UTC
**Owner:** Praveen


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-08 Thread Praveen via Opensaf-tickets

What is the configuration to reproduce this issue?
With 2 CSIs in a SI in amf_demo app, I am not observing this issue.
Attached are amfd and amfnd traces after successful verification.


Attachments:

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/aa56/attachment/osafamfd)
 (1.8 MB; application/octet-stream)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2f504cc3/aa56/attachment/osafamfnd)
 (1.9 MB; application/octet-stream)


---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Thu Jun 08, 2017 12:19 AM UTC
**Owner:** Praveen


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-07 Thread Praveen via Opensaf-tickets


Currently AMFD returns TRY_AGAIN when SG is unstable and some admin op request 
comes for most of the entities. For admin restart of SU, currently AMFD returns 
TRY_AGAIN. I proposed solution on those lines. However, there is a general 
enhancement ticket to allow admin op in SG unstable cases:
\#1873 amf: Avoid rejecting user requests due to internal "unstable" state.

Spec is not very much clear in general for all cases. Only at one place (9.4.7 
SA_AMF_ADMIN_RESTART page 384 ), to avoid restart admin op parallely over other 
admin op going on same entity, it states that:
"The Availability Management Framework must not proceed with this operation if
another administrative operation or an error recovery initiated by the 
Availability Management
Framework is already engaged on the logical entity. In such case, the
SA_AIS_ERR_TRY_AGAIN error value shall be returned to indicate that the action 
is
feasible but not at this instant."

For the case locking of standby SU and restart of component in active SU: for a 
restartable component it can be allowed as it will be local to AMFND. But for a 
non-restartable component assignments needs to be switchovered and AMFD will 
have to find standby SUs. It will increase the complexity for red models like 
Nway and NplusM.





---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Wed Jun 07, 2017 08:37 AM UTC
**Owner:** Praveen


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-07 Thread Praveen via Opensaf-tickets

- **status**: assigned --> accepted
- **Comment**:

Since SG is not stable, AMFD should return TRY_AGAIN to IMM client. This check 
is missing in comp_admin_op_cb() in amfd/comp.cc. I guess assignment in 
component is happening because of cluster startup timer expiry (not due to any 
other admin operation).



---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Wed Jun 07, 2017 08:16 AM UTC
**Owner:** Praveen


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2485 amfnd: missing susi response if component is restarted

2017-06-07 Thread Praveen via Opensaf-tickets

- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2485] amfnd: missing susi response if component is restarted**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Wed Jun 07, 2017 12:57 AM UTC by Gary Lee
**Last Updated:** Wed Jun 07, 2017 12:57 AM UTC
**Owner:** Praveen


An SI contains multiple CSIs. If a restart component admin operation arrives at 
amfnd before all CSIs are assigned,
the SUSI response is not sent to AMFD.

This code in avnd_comp_csi_assign_done() appears to be the problem area.

  /* while restarting, we wont use assign all, so csi will not be null */
  if (csi && m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(csi,
  AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED);
goto done;
  }

Perhaps we should not initiate a restart in avnd_evt_comp_admin_op_req(), if
a component is still in AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING state.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2469 clm: Stop tracking api returns NOT_EXIST

2017-06-07 Thread Praveen via Opensaf-tickets

It seems TrackStop() request came to CLMS and it executed it. But the client 
which is AMFD received ERR_TIMEOUT:
2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5

Now same AMFD after becoming standby  tries to do TrackStop() again. It will 
surely get ERR_NOT_EXIST as the tracking was stopped.
AMFD should finalize the handle when its gets ERR_TIMEOUT.



---

** [tickets:#2469] clm: Stop tracking api returns NOT_EXIST**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Mon May 29, 2017 12:19 AM UTC by Minh Hon Chau
**Last Updated:** Mon May 29, 2017 09:21 AM UTC
**Owner:** Praveen


When performing switchover, AMFD fails to stop CLM track callback with error 
code 12 (NOT_EXIST)

**syslog:
**
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO Controller switch over initiated
2017-05-26 10:19:02 SC-1 osafamfd[268]: NO ROLE SWITCH Active --> Quiesced
2017-05-26 10:19:02 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 40 
(@OpenSafImmReplicatorB) <343, 2010f>
2017-05-26 10:19:02 SC-1 osafntfimcnd[626]: NO Started
2017-05-26 10:19:12 SC-1 osafamfd[268]: WA Failed to stop cluster tracking 5
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 32 <27, 
2010f> (safAmfService)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer (applier) connected: 41 
(@safAmfService2010f) <27, 2010f>
2017-05-26 10:19:12 SC-1 osafamfnd[283]: NO AVD NEW_ACTIVE, adest:1
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer disconnected 31 <0, 
2020f> (@safAmfService2020f)
2017-05-26 10:19:12 SC-1 osafimmnd[205]: NO Implementer connected: 42 
(safAmfService) <0, 2020f>
2017-05-26 10:19:12 SC-1 osafamfd[268]: NO Switching Quiesced --> StandBy
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 12
2017-05-26 10:19:13 SC-1 osafamfd[268]: ER Failed to stop cluster tracking 
after switch over
2017-05-26 10:19:13 SC-1 osafamfd[268]: NO Controller switch over done

**CLM trace:
**
May 26 10:19:13.173369 osafclmd [240:240:src/clm/clmd/clms_evt.c:1347] >> 
proc_track_stop_msg 
May 26 10:19:13.173374 osafclmd [240:240:src/clm/clmd/clms_util.c:0126] >> 
clms_node_get_by_id 
May 26 10:19:13.173379 osafclmd [240:240:src/clm/clmd/clms_util.c:0137] TR Node 
found 131343
May 26 10:19:13.173383 osafclmd [240:240:src/clm/clmd/clms_util.c:0140] << 
clms_node_get_by_id 
May 26 10:19:13.173388 osafclmd [240:240:src/clm/clmd/clms_evt.c:1350] TR Node 
id = 131343
May 26 10:19:13.173393 osafclmd [240:240:src/clm/clmd/clms_mds.c:1553] >> 
clms_mds_msg_send 
May 26 10:19:13.173448 osafclmd [240:240:src/clm/clmd/clms_mds.c:1587] << 
clms_mds_msg_send 
May 26 10:19:13.173457 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0810] >> 
clms_send_async_update 
May 26 10:19:13.173462 osafclmd [240:240:src/mbc/mbcsv_api.c:0798] >> 
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, 
as per the send-type specified
May 26 10:19:13.173504 osafclmd [240:240:src/mbc/mbcsv_api.c:0830] TR 
svc_id:48, pwe_hdl:65552
May 26 10:19:13.173509 osafclmd [240:240:src/mbc/mbcsv_util.c:0363] >> 
mbcsv_send_ckpt_data_to_all_peers 
May 26 10:19:13.173593 osafclmd [240:240:src/mbc/mbcsv_util.c:0411] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26 10:19:13.173599 osafclmd [240:240:src/mbc/mbcsv_act.c:0103] TR ASYNC 
update to be sent. role: 1, svc_id: 48, pwe_hdl: 65552
May 26 10:19:13.173604 osafclmd [240:240:src/mbc/mbcsv_util.c:0424] TR calling 
encode callback
May 26 10:19:13.173610 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0740] >> 
mbcsv_callback 
May 26 10:19:13.173615 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0856] >> 
ckpt_encode_cbk_handler 
May 26 10:19:13.173626 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0867] TR 
cbk_arg->info.encode.io_msg_type type 1
May 26 10:19:13.173632 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1307] >> 
ckpt_encode_async_update 
May 26 10:19:13.173637 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1324] TR 
data->header.type 3
May 26 10:19:13.173641 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1362] TR 
Async update CLMS_CKPT_TRACK_START
May 26 10:19:13.173646 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1701] >> 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173650 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1714] << 
enc_mbcsv_track_changes_msg 
May 26 10:19:13.173654 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:1515] << 
ckpt_encode_async_update 
May 26 10:19:13.173658 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0910] << 
ckpt_encode_cbk_handler 
May 26 10:19:13.173663 osafclmd [240:240:src/clm/clmd/clms_mbcsv.c:0780] << 
mbcsv_callback 
May 26 10:19:13.173667 osafclmd [240:240:src/mbc/mbcsv_util.c:0469] TR send the 
encoded message to any other peer with same s/w version
May 26 10:19:13.173671 osafclmd [240:240:src/mbc/mbcsv_util.c:0472] TR 
dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE
May 26

[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.

2017-06-06 Thread Praveen via Opensaf-tickets

Hi,
In the later releases assert has been replaced with warning. 
Without amfd traces, it is not possbile to know why the counter was decremented 
before the assert.
I will try to reproduce based on code analysis and will update further.

Thanks
Praveen



---

** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI 
assignment counter during fail-over.**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Thu May 25, 2017 08:46 AM UTC by Praveen
**Last Updated:** Thu Jun 01, 2017 02:41 PM UTC
**Owner:** nobody


Ticket is based on a issue reported via user list mail dated: 22-May-17, 
subject  "[users] osafamfd coredump issue.


Here is syslog when the issue occurred:

2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.5:bond0>, peer not responding

2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.5:bond0> on network plane A

2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5>

2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:287038266327043)

2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1050f pid:15395

2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 104 <0, 1050f(down)> (MsgQueueService66831)

2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left 
the cluster

2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state 
change notification from NTF, entity PLD0105 now has new state DISABLED (Oper 
state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed)

2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: 
dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.

2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.6:bond0>, peer not responding

2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.6:bond0> on network plane A

2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6>

2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:288139774320643)

2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1060f pid:17439

2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 106 <0, 1060f(down)> (MsgQueueService67087)

2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director 
unexpectedly crashed

2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId 
= 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 69647, SupervisionTime = 0

2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally 
disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; 
timeout=0

2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2477 amfd: Cyclic reboot after SC absence period (in large cluster)

2017-06-05 Thread Praveen

Hi Minh,

What are the steps to reproduce after applying the patch 2477_rep.diff?


Thanks,
Praveen


---

** [tickets:#2477] amfd: Cyclic reboot after SC absence period (in large 
cluster)**

**Status:** review
**Milestone:** 5.17.06
**Labels:** assignment failover during stop of both SC 2416 
**Created:** Fri Jun 02, 2017 06:17 AM UTC by Minh Hon Chau
**Last Updated:** Fri Jun 02, 2017 09:25 AM UTC
**Owner:** Minh Hon Chau


The scenario of the problem in this ticket happens in the same scenario 
reported in #2416

After SC absence period, amfd gets into osafassert(), causes coredump, and the 
problem repeatedly happens 

One of patches of #2416 had tried to call IMM sync as soon as possible, and it 
works fine with a small cluster (5 nodes). But a large cluster consists of 
about 75 nodes, the change of IMM sync calls takes mostly no effect. 

In #2416, a problem had been seen with an assumption of unreliable IMM sync 
calls in which after SC absence period, amfd had 3 assignments for a 2N SG, 2 
STANDBY SUSIs , and 1 ACTIVE SUSI. It was fixed by commit :"amfd: Add iteration 
to failover all absent assignments [#2416]" (refer to: 
https://sourceforge.net/p/opensaf/tickets/2416/#f83b)

Another variant problem of unreliable IMM calls before both SC go down, is that 
amfd can have both SUs with ACTIVE assignments, that leads to assert. This 
problem can only be seen in large cluster so far


Details of coredump:
 
~~~
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafamfd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install 
opensaf-amf-director-debuginfo-5.2.0-469.0.6128a2d.sle12.x86_64
(gdb) bt full
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f784279c478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f78435fdf4e in __osafassert_fail (__file=, 
__line=, __func=, 
__assertion=) at ../../opensaf/src/base/sysf_def.c:286
No locals.
#3  0x7f78445671e8 in avd_sg_2n_act_susi (sg=, 
stby_susi=stby_susi@entry=0x7ffeef034998, cb=0x7f78447f2e80 <_control_block>)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:596
susi = 
a_susi_2 = 0x7f7845e0d0c0
s_susi_1 = 0x7f7845e0d0c0
su_2 = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
s_susi_2 = 0x7f7845e2a030
a_susi = 0x0
a_susi_1 = 0x7f7845e2a030
s_susi = 0x0
su_1 = 0x7f7845d69e60
#4  0x7f784456d5d6 in SG_2N::node_fail (this=0x7f7845d5f4f0, 
cb=0x7f78447f2e80 <_control_block>, su=0x7f7845d69e60)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:3402
a_susi = 
s_susi = 0x7f7845d69a68
o_su = 
flag = 
__FUNCTION__ = "node_fail"
su_ha_state = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
#5  0x7f784455de1a in AVD_SG::failover_absent_assignment 
(this=0x7f7845d5f4f0) at ../../opensaf/src/amf/amfd/sg.cc:2307
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "failover_absent_assignment"
failed_su = 0x7f7845d69e60
#6  0x7f7844514125 in avd_cluster_tmr_init_evh (cb=0x7f78447f2e80 
<_control_block>, evt=)
at ../../opensaf/src/amf/amfd/cluster.cc:103
i_sg = 0x7f7845d5f4f0
__for_range = @0x7f7845ca2a90: {db = {_M_t = {
  _M_impl = 
{<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, 
std::char_traits, std::allocator > const, AVD_SG*> > >> = 
{<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, 
std::char_traits, std::allocator > const, AVD_SG*> > >> = {}, }, 
_M_key_compare = {<std::binary_function<std::basic_string<char, 
std::char_traits, std::allocator >, std::basic_string<char, 
std::char_traits, std::allocator >, bool>> = {}, 
}, _M_header = {_M_color = std::_S_red, 
  _M_parent = 0x7f7845d515e0, _M_left = 0x7f7845d03ed0, 
_M_right = 0x7f7845d81580}, _M_node_count = 28
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 
#7  0x7f784453ca2c in process_event (cb_now=0x7f78447f2e80 
<_control_block>, evt=0x7f78340013d0) at ../../opensaf/src/amf/amfd/main.cc:775
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "process_event"
#8  0x7f78444f6abe in main_loop () at ../../opensaf/src/amf/amfd/main.cc:691
pollretval = 
evt = 0x7f78340013d0
polltmo = 0
term_fd = 24
cb = 0x7f78447f2e80 <_control_block>
error = 
old

[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.

2017-06-01 Thread Praveen

- **status**: assigned --> review



---

** [tickets:#2475] amf: support for SC status change Callback, non SAF.**

**Status:** review
**Milestone:** 5.17.08
**Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen
**Last Updated:** Thu Jun 01, 2017 10:19 AM UTC
**Owner:** Praveen


This enhancement is for supporting two resources in AMFA which will enable 
application to know about
SCs Absence and Presence state when they go down and comes up.

Information about the resources:
* A callback that will be invoked by AMFA whenever a SC joins cluster and
  both SCs leaves cluster if SC Absence feature is enabled.

  Callback and its argument:

  void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state)
  where OsafAmfSCStatusT is defined as:
typedef enum {
  OSAF_AMF_SC_PRESENT = 1,
  OSAF_AMF_SC_ABSENT = 2,
} OsafAmfSCStatusT;

  This callback can be integrated
  with standard AMF application component .
* An API to register/install above callback function:
   void osafAmfInstallSCStatusChangeCallback(
 void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status)
 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2475 amf: support for SC status change Callback, non SAF.

2017-06-01 Thread Praveen




---

** [tickets:#2475] amf: support for SC status change Callback, non SAF.**

**Status:** assigned
**Milestone:** 5.17.08
**Created:** Thu Jun 01, 2017 10:19 AM UTC by Praveen
**Last Updated:** Thu Jun 01, 2017 10:19 AM UTC
**Owner:** Praveen


This enhancement is for supporting two resources in AMFA which will enable 
application to know about
SCs Absence and Presence state when they go down and comes up.

Information about the resources:
* A callback that will be invoked by AMFA whenever a SC joins cluster and
  both SCs leaves cluster if SC Absence feature is enabled.

  Callback and its argument:

  void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT state)
  where OsafAmfSCStatusT is defined as:
typedef enum {
  OSAF_AMF_SC_PRESENT = 1,
  OSAF_AMF_SC_ABSENT = 2,
} OsafAmfSCStatusT;

  This callback can be integrated
  with standard AMF application component .
* An API to register/install above callback function:
   void osafAmfInstallSCStatusChangeCallback(
 void (*OsafAmfSCStatusChangeCallbackT)(OsafAmfSCStatusT status)
 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-05-26 Thread Praveen

- **status**: accepted --> review
- **Blocker**:  --> True



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** review
**Milestone:** 5.17.08
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Thu Apr 20, 2017 04:08 AM UTC
**Owner:** Praveen


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.

2017-05-25 Thread Praveen

- **Version**: 5.2 --> 5.1
- **Comment**:

Observed in 5.1.



---

** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI 
assignment counter during fail-over.**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Thu May 25, 2017 08:46 AM UTC by Praveen
**Last Updated:** Thu May 25, 2017 11:19 AM UTC
**Owner:** nobody


Ticket is based on a issue reported via user list mail dated: 22-May-17, 
subject  "[users] osafamfd coredump issue.


Here is syslog when the issue occurred:

2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.5:bond0>, peer not responding

2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.5:bond0> on network plane A

2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5>

2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:287038266327043)

2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1050f pid:15395

2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 104 <0, 1050f(down)> (MsgQueueService66831)

2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left 
the cluster

2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state 
change notification from NTF, entity PLD0105 now has new state DISABLED (Oper 
state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed)

2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: 
dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.

2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.6:bond0>, peer not responding

2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.6:bond0> on network plane A

2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6>

2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:288139774320643)

2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1060f pid:17439

2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 106 <0, 1060f(down)> (MsgQueueService67087)

2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director 
unexpectedly crashed

2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId 
= 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 69647, SupervisionTime = 0

2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally 
disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; 
timeout=0

2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2468 amf: amfd asserts while decrementing opensaf NoRed SI assignment counter during fail-over.

2017-05-25 Thread Praveen




---

** [tickets:#2468] amf: amfd asserts while decrementing opensaf NoRed SI 
assignment counter during fail-over.**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Thu May 25, 2017 08:46 AM UTC by Praveen
**Last Updated:** Thu May 25, 2017 08:46 AM UTC
**Owner:** nobody


Ticket is based on a issue reported via user list mail dated: 22-May-17, 
subject  "[users] osafamfd coredump issue.


Here is syslog when the issue occurred:

2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.5:bond0>, peer not responding

2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.5:bond0> on network plane A

2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with <1.1.5>

2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:287038266327043)

2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1050f pid:15395

2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 104 <0, 1050f(down)> (MsgQueueService66831)

2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' left 
the cluster

2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI state 
change notification from NTF, entity PLD0105 now has new state DISABLED (Oper 
state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed)

2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: 
dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.

2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link 
<1.1.16:eth2-1.1.6:bond0>, peer not responding

2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link 
<1.1.16:eth2-1.1.6:bond0> on network plane A

2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with <1.1.6>

2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from svc_id 
25 (change:4, dest:288139774320643)

2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went down. 
Not sending track callback for agents on that node

2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard node 
received for nodeId:1060f pid:17439

2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 106 <0, 1060f(down)> (MsgQueueService67087)

2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director 
unexpectedly crashed

2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF NodeId 
= 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 69647, SupervisionTime = 0

2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer locally 
disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer 
disconnected 105 <29, 1100f> (safAmfService)

2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local node; 
timeout=0

2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) local node



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2466 AMF: NodeGroup Admin UNLOCK timeout during cluster start up

2017-05-23 Thread Praveen

Hi Gary,
Patch looks good. 
I have cheked, ng_unlock() does check cb->init_state and does not assigned SUs 
if the state is init_done.

Thanks,
Praveen



---

** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau
**Last Updated:** Tue May 23, 2017 06:35 AM UTC
**Owner:** nobody


When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in 
this case), the nodegroup admin op can be timed out, because the 
su_cnt_admin_oper of one of PLs remains 1 forever

Sequence in details:
- A cluster has 4 nodes, start cluster
- When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue
~~~
May 22 14:33:46.665539 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined 
the cluster
May 22 14:33:48.115919 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined 
the cluster
May 22 14:34:00.442633 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined 
the cluster
~~~

  NoRed Opensaf SU of PL4 get assigned

~~~
May 22 14:34:00.637324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2040f, act:2, 
'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

   admin unlock nodegroup issues

~~~
 May 22 14:34:02.989761 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 
'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', 
op:'1'
 ~~~
 
- When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment

~~~
 May 22 14:34:10.096324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> 
avd_su_oper_state_evh: id:29, node:2030f, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1
 May 22 14:34:10.097537 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0
 May 22 14:34:10.097549 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' 
state=1
May 22 14:34:10.097552 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1
~~~

 The su_cnt_admin_oper of NoRed Opensaf SU is increased.
 
~~~
May 22 14:34:10.098839 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg 
May 22 14:34:10.098841 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1
~~~

- When NoRed Opensaf SU get assigned

~~~
May 22 14:34:10.105283 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2030f, act:2, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

  but this su_cnt_admin_oper is not decreased

~~~
May 22 14:34:10.108143 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:] << susi_success
May 22 14:34:10.108148 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2010f203defc2 node not ready for assignments
May 22 14:34:10.108153 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2020fc2b319b5 node not ready for assignments
May 22 14:34:10.108157 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> 
avd_nd_ncs_su_assigned 
May 22 14:34:10.108162 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT
~~~

  At the end, su_cnt_admin_oper still remains 1.
  
  The application SU get assigned, the counter's always decreased
~~~
May 22 14:34:10.444624 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1
May 22 14:34:10.444629 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2
May 22 14:34:10.444632 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> 
process_su_si_response_for_ng: 
'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer'
May 22 14:34:10.444640 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << 
process_su_si_response_for_ng 
~~~
There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU 
when decreased counter
...
  /* else admin oper still not complete */
} else if ((su->sg_of_su->sg_ncs_spec == false) &&
   ((su->su_on_node->admin_ng != nullptr) ||
(su->sg_of_su->ng_us

[tickets] [opensaf:tickets] #2466 AMF: NodeGroup Admin UNLOCK timeout during cluster start up

2017-05-22 Thread Praveen

Hi Minh,

I will go thorugh it today.

Thanks
Praveen


---

** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau
**Last Updated:** Tue May 23, 2017 05:13 AM UTC
**Owner:** nobody


When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in 
this case), the nodegroup admin op can be timed out, because the 
su_cnt_admin_oper of one of PLs remains 1 forever

Sequence in details:
- A cluster has 4 nodes, start cluster
- When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue
~~~
May 22 14:33:46.665539 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined 
the cluster
May 22 14:33:48.115919 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined 
the cluster
May 22 14:34:00.442633 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined 
the cluster
~~~

  NoRed Opensaf SU of PL4 get assigned

~~~
May 22 14:34:00.637324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2040f, act:2, 
'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

   admin unlock nodegroup issues

~~~
 May 22 14:34:02.989761 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 
'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', 
op:'1'
 ~~~
 
- When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment

~~~
 May 22 14:34:10.096324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> 
avd_su_oper_state_evh: id:29, node:2030f, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1
 May 22 14:34:10.097537 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0
 May 22 14:34:10.097549 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' 
state=1
May 22 14:34:10.097552 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1
~~~

 The su_cnt_admin_oper of NoRed Opensaf SU is increased.
 
~~~
May 22 14:34:10.098839 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg 
May 22 14:34:10.098841 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1
~~~

- When NoRed Opensaf SU get assigned

~~~
May 22 14:34:10.105283 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2030f, act:2, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

  but this su_cnt_admin_oper is not decreased

~~~
May 22 14:34:10.108143 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:] << susi_success
May 22 14:34:10.108148 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2010f203defc2 node not ready for assignments
May 22 14:34:10.108153 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2020fc2b319b5 node not ready for assignments
May 22 14:34:10.108157 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> 
avd_nd_ncs_su_assigned 
May 22 14:34:10.108162 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT
~~~

  At the end, su_cnt_admin_oper still remains 1.
  
  The application SU get assigned, the counter's always decreased
~~~
May 22 14:34:10.444624 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1
May 22 14:34:10.444629 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2
May 22 14:34:10.444632 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> 
process_su_si_response_for_ng: 
'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer'
May 22 14:34:10.444640 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << 
process_su_si_response_for_ng 
~~~
There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU 
when decreased counter
...
  /* else admin oper still not complete */
} else if ((su->sg_of_su->sg_ncs_spec == false) &&
   ((su->su_on_node->admin_ng != nullptr) ||
(su->sg_of_su->ng_using_saAmfSGAdminState == true))) {
  AVD_AMF_NG *ng = su->su_on_node->admin_ng;
  // Got resp

[tickets] [opensaf:tickets] #2105 AMF : SG is unstable, if app responds during node link loss detection time period

2017-05-17 Thread Praveen

Attached traces when AMFD drops the message from Amfnd.


Attachments:

- 
[traces.zip](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/09f00ae8/1ae0/attachment/traces.zip)
 (195.0 kB; application/x-zip-compressed)


---

** [tickets:#2105] AMF : SG is unstable, if app responds during node link loss 
detection time period**

**Status:** review
**Milestone:** 5.17.06
**Created:** Sun Oct 09, 2016 07:12 AM UTC by Srikanth R
**Last Updated:** Mon May 15, 2017 07:38 AM UTC
**Owner:** Minh Hon Chau


Setup :
Changeset : 8190 
5 node SLES  setup with 2 controllers and 3 payloads ( TIPC -- headless enabled)
2n application deployed on 2 payloads.

Issue : 

 -> Perform admin operation on an AMF enity.
 -> Do not respond to the callback and invoke headless scenario.
 -> On a VM with TIPC setup, 3 seconds is taken to detect the node down. 
 -> If the application responds to a callback in admin operation during this 
time period when the last controller is  down, the message shall not reach any 
controller. Amfnd on payload shall send the "Assigned" message  but not store 
this message. 
 
  For this scenario, SG shall move to unstable state. Below is the snippet from 
syslog, where application responded at 15:48:28 and at 15:48:31 payloads 
detected that last controller is down.
  
 Oct  7 15:48:28 SYSTEST-PLD-1 osafamfnd[9976]: NO Assigned 
'safSi=TestApp_SI1,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: WA AMF director unexpectedly 
crashed
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' for pending messages
Oct  7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: WA SC Absence IS allowed:900 
IMMD service is DOWN
Oct  7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: NO IMMD SERVICE IS DOWN, HYDRA 
IS CONFIGURED => UNREGISTERING IMMND form MDS


-> Below is the scenario, when payload detected that there is no controller at 
18:31:34 and amfnd shall call avnd_di_susi_resp_send after the controllers join 
back the cluster. Application responded at 18:31:41.

Oct  7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: WA SC Absence IS allowed:900 
IMMD service is DOWN
Oct  7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: NO IMMD SERVICE IS DOWN, HYDRA 
IS CONFIGURED => UNREGISTERING IMMND form MDS
Oct  7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct  7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO avnd_di_susi_resp_send() 
deferred as AMF director is offline


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.

2017-05-15 Thread Praveen

- **status**: review --> fixed
- **Blocker**:  --> False
- **Comment**:

git default:
commit a7bb655d2e8b50bf22b168f7492eab9970a98849
Author: Praveen <praveen.malv...@oracle.com>
Date:   Fri May 12 15:09:30 2017 +0530

clm: add tool commands clm-adm, clm-state, clm-find [#2394]


hg default:
changeset:   8798:cf45b604af4b
tag: tip
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Tue May 16 10:40:40 2017 +0530
summary: clm: add tool commands clm-adm, clm-state, clm-find [#2394]




---

** [tickets:#2394] clm: add clm tool commands for admin op and state check.**

**Status:** fixed
**Milestone:** 5.17.08
**Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen
**Last Updated:** Fri May 12, 2017 09:56 AM UTC
**Owner:** Praveen


Intention is to add clm tool comamnds:
-to perform admin operation on node or on cluster. Something like 
clm-adm <lock|shutdown|unlock|reset> 

-to check CLM nodes admin state and member ship status: like
clm-state  <membership|adminstate>
-to find CLM cluster and nodes like:
clm-find  <memebers|non-member>


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2443 amf: amf gets stuck after headless while processing node_up messages

2017-05-15 Thread Praveen

Hi Long,

If amfd and amfnd traces are available from both the controllers, please upload 
here.

Thanks,
Praveen


---

** [tickets:#2443] amf: amf gets stuck after headless while processing node_up 
messages**

**Status:** review
**Milestone:** 5.17.06
**Created:** Thu Apr 27, 2017 11:59 AM UTC by Long H Buu Nguyen
**Last Updated:** Fri Apr 28, 2017 04:13 AM UTC
**Owner:** Long H Buu Nguyen


Description:
After headless, SCs come up. During that time, if the Active SC is rebooted 
while the other SC is still initialising. There is a case that amfd in the 
other SC gets stuck in processing node_up messages. As a result, opensafd fails 
to start.

Observation:
Infinite node_up from syslog:
2017-04-18 14:17:36 SC-1 osafamfd[478]: NO Received node_up from 2040f: msg_id 1
2017-04-18 14:17:37 SC-1 osafamfd[478]: NO Received node_up from 2020f: msg_id 1
2017-04-18 14:17:37 SC-1 osafamfd[478]: NO Received node_up from 2030f: msg_id 1
...

Steps to reproduce:
1) Start a cluster.
2) Turn off SCs.
3) Turn on SCs.
4) After a SC becomes ACTIVE, while amfnd on the other SC is initialising NCS 
SU, restart the active SC.
5) Amfnd on the other SC receives NEW_ACTIVE and then gets stuck with node_up 
messages.

Investigation:
Assume after headless, SC-1 becomes ACTIVE. Amfnd in SC-2 sends a node_up 
message to amfd-SC-1.
amfnd-SC-2 will instantiate NCS SUs in SC-2 as soon as amfd-SC-1 receives the 
node_up message.
At the time NCS SUs in SC-2 are INSTANTIATED, if SC-1 is rebooted, amfnd-SC-2 
receives NEW_ACTIVE because amfd-SC-2 is set to ACTIVE by RDE.
amfnd-SC-2 sends a node_up message to amfd-SC-2. Later, amfnd-SC-2 continues to 
instantiate NCS SUs in SC-2. However, the NCS SUs in SC-2 are already 
INSTANTIATED.
amfnd-SC-2 does not send oper_state message to amfd-SC-2 because the NCS SU 
presence states do not change:
Apr 18 14:35:36.869223 osafamfnd [486:486:src/amf/amfnd/susm.cc:1563] >> 
avnd_su_pres_fsm_run: 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr 18 14:35:36.869240 osafamfnd [486:486:src/amf/amfnd/susm.cc:1570] T1 
Entering SU presence state FSM: current state: 3, event: 1, su 
name:safSu=SC-1,safSg=2N,safApp=OpenSAF
Apr 18 14:35:36.869257 osafamfnd [486:486:src/amf/amfnd/susm.cc:1581] T1 Exited 
SU presence state FSM: New State = 3
Apr 18 14:35:36.869273 osafamfnd [486:486:src/amf/amfnd/susm.cc:1614] << 
avnd_su_pres_fsm_run: 1


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.

2017-05-12 Thread Praveen

Published V2 after incorporating comments.


---

** [tickets:#2394] clm: add clm tool commands for admin op and state check.**

**Status:** review
**Milestone:** 5.17.08
**Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen
**Last Updated:** Mon Apr 10, 2017 01:40 PM UTC
**Owner:** Praveen


Intention is to add clm tool comamnds:
-to perform admin operation on node or on cluster. Something like 
clm-adm <lock|shutdown|unlock|reset> 

-to check CLM nodes admin state and member ship status: like
clm-state  <membership|adminstate>
-to find CLM cluster and nodes like:
clm-find  <memebers|non-member>


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.

2017-05-11 Thread Praveen

- **status**: review --> fixed
- **Blocker**:  --> True
- **Comment**:

5.17.08:
commit aaf6c29d6e9dc59f44f37d720c043dd6c8dad4a4
Author: Praveen <praveen.malv...@oracle.com>
Date:   Thu May 11 13:59:17 2017 +0530

amf: support amf tool command to know AMF cluster/nodes status [#2354]

default (hg):
changeset:   8792:db2ba23d2963
tag: tip
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Thu May 11 14:30:23 2017 +0530
summary: amf: support amf tool command to know AMF cluster/nodes status 
[#2354]






---

** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes 
status.**

**Status:** fixed
**Milestone:** 5.17.08
**Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen
**Last Updated:** Fri Apr 14, 2017 11:57 AM UTC
**Owner:** Praveen


This discussion ticket is being raised based on a user list query dated March 
1st, 2017.
The query says:
 "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, 
it works good so far.
 
 Now we need to make some actions when PLD go in/out "SC Absence" mode, we have 
to find a way in PLD to detect if it is being in "SC Absent" mode or not.
 So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF 
API) as well?
 "
 I think we do not have any API which can be used to query OpenSAF for knowing 
SC absence state.
MDS  up and down events of directors can be used to decide SC absence state as 
some agents are and node directors are using. But this will add lot of code in 
application.

Please update this ticket for a known or proposed solution. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state

2017-04-24 Thread Praveen

- **status**: review --> fixed
- **Blocker**:  --> True
- **Comment**:

develop:
commit 20970cf5e21496bfef532f62cdb860388660ef62
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Apr 24 11:24:18 2017 +0530

amfd: allow nodeswbundle deletion if anyone of Node, SU or SG is locked_in 
[#2421]

5.17.06:
commit 6c613964e1ce56ef5798b3d426d40aae1c5068ef
Author: Praveen <praveen.malv...@oracle.com>
Date:   Mon Apr 24 11:24:18 2017 +0530

amfd: allow nodeswbundle deletion if anyone of Node, SU or SG is locked_in 
[#2421]




---

** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG 
and Node admin state**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh
**Last Updated:** Fri Apr 21, 2017 08:50 AM UTC
**Owner:** Praveen


During deleting of NodeSwBundle object, AMF only check if the SUs admin state 
is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object 
is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION 
state, which implicitetly means that the SU is UNINSTANTIATED.
This currently blocks the SMF campagin to be rolled back in some situation.

The SU's node admin state and SU's SG admin state should also be checked and 
the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION.

/Tai


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-04-23 Thread Praveen

- **Blocker**:  --> False
- **Milestone**: 5.17.08 --> 5.17.06
- **Comment**:

Pushed in released branch with revision:
commit a79f101873dffd145aa70d9cb4eb3c99b8ffd4ca
Author: Praveen <praveen.malv...@oracle.com>
Date:   Fri Apr 21 14:31:19 2017 +0530

clms: return TIME_OUT for unlock op if CLMS update to CLM agent fails 
[#2381]





---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Fri Apr 21, 2017 11:21 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
..
..
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
--

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another 
Admin operation already in progress: 4
~~~


Notes:
1. Syslog of Active controller attached
2. os

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-04-21 Thread Praveen

- **status**: review --> fixed
- **Comment**:

commit 66970f59421f9d4338ee6d13134afca9082c1e91
Author: Praveen <praveen.malv...@oracle.com>
Date:   Fri Apr 21 14:31:19 2017 +0530
clms: return TIME_OUT for unlock op if CLMS update to CLM agent fails 
[#2381]


changeset:   8775:10bbd3156a40
tag: tip
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Fri Apr 21 14:45:15 2017 +0530
summary: clms: return TIME_OUT for unlock op if CLMS update to CLM agent 
fails [#2381]




---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** fixed
**Milestone:** 5.17.08
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Mon Apr 10, 2017 01:40 PM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
..
..
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
--

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name

[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state

2017-04-20 Thread Praveen

- **status**: assigned --> accepted



---

** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG 
and Node admin state**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh
**Last Updated:** Thu Apr 20, 2017 04:07 AM UTC
**Owner:** Praveen


During deleting of NodeSwBundle object, AMF only check if the SUs admin state 
is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object 
is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION 
state, which implicitetly means that the SU is UNINSTANTIATED.
This currently blocks the SMF campagin to be rolled back in some situation.

The SU's node admin state and SU's SG admin state should also be checked and 
the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION.

/Tai


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-04-19 Thread Praveen

- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Milestone**: future --> 5.17.08



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** accepted
**Milestone:** 5.17.08
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Thu Mar 23, 2017 07:11 AM UTC
**Owner:** Praveen


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2421 amfd: is_swbdl_delete_ok_for_node should also check for SG and Node admin state

2017-04-19 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Milestone**: 5.17.08 --> 5.17.06



---

** [tickets:#2421] amfd: is_swbdl_delete_ok_for_node should also check for SG 
and Node admin state**

**Status:** assigned
**Milestone:** 5.17.06
**Created:** Tue Apr 11, 2017 09:33 AM UTC by Tai Dinh
**Last Updated:** Tue Apr 11, 2017 09:33 AM UTC
**Owner:** Praveen


During deleting of NodeSwBundle object, AMF only check if the SUs admin state 
is at LOCKED_INSTANTIATION or not. Which means that the deletion of that object 
is not allowed even in the case where the SG or Node is at LOCKED_INSTANTIATION 
state, which implicitetly means that the SU is UNINSTANTIATED.
This currently blocks the SMF campagin to be rolled back in some situation.

The SU's node admin state and SU's SG admin state should also be checked and 
the deletetion should be allowed if one of above state is LOCKED_INSTANTIATION.

/Tai


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.

2017-04-14 Thread Praveen

- **status**: accepted --> review
- **Milestone**: future --> 5.17.08



---

** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes 
status.**

**Status:** review
**Milestone:** 5.17.08
**Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen
**Last Updated:** Tue Apr 11, 2017 06:20 AM UTC
**Owner:** Praveen


This discussion ticket is being raised based on a user list query dated March 
1st, 2017.
The query says:
 "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, 
it works good so far.
 
 Now we need to make some actions when PLD go in/out "SC Absence" mode, we have 
to find a way in PLD to detect if it is being in "SC Absent" mode or not.
 So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF 
API) as well?
 "
 I think we do not have any API which can be used to query OpenSAF for knowing 
SC absence state.
MDS  up and down events of directors can be used to decide SC absence state as 
some agents are and node directors are using. But this will add lot of code in 
application.

Please update this ticket for a known or proposed solution. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2354 amf: support amf tool command to know AMF cluster/nodes status.

2017-04-11 Thread Praveen

- **summary**: osaf: How to detect if payload is being in "SC Absence" mode. 
--> amf: support amf tool command to know AMF cluster/nodes status.
- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Type**: discussion --> enhancement
- **Component**: osaf --> amf
- **Part**: - --> tools
- **Priority**: minor --> major
- **Comment**:

I think,  a  tool command and a callback is needed.

With tool command , a user can check status of nodes in cluster any time. Since 
both CLM and AMF have notion of nodes amd cluster, a user may want to know the 
status of CLM or AMF cluster. But this status is not just mere listing of 
nodes. This is already being done with currently supported utilities. The 
command should also consider OpenSAF status also.
For example: 
 During SCs Absence, amf-state siass list of SISUs for controllers also, but a 
user can not know that controllers are up or not with this.
 
 For callback, a user can not run tool command continuously to check whether 
controllers exist or not. Also calling some SAF API on payload in an 
application to know, based on its return status, whether host payload is in SC 
Absence mode or not is not a proper solution as  return code of API can have 
multiple interpretations. So there should be some callback also to inform 
application that this host payload has entered SC absence mode or has returned 
back to SC Presence mode.Application will subsribe for this callback,
 
I will send out a patch for amf cluster status and will see possiblity of a 
callback either in CLM or AMF. 



---

** [tickets:#2354] amf: support amf tool command to know AMF cluster/nodes 
status.**

**Status:** accepted
**Milestone:** future
**Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen
**Last Updated:** Wed Mar 08, 2017 07:28 AM UTC
**Owner:** Praveen


This discussion ticket is being raised based on a user list query dated March 
1st, 2017.
The query says:
 "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, 
it works good so far.
 
 Now we need to make some actions when PLD go in/out "SC Absence" mode, we have 
to find a way in PLD to detect if it is being in "SC Absent" mode or not.
 So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF 
API) as well?
 "
 I think we do not have any API which can be used to query OpenSAF for knowing 
SC absence state.
MDS  up and down events of directors can be used to decide SC absence state as 
some agents are and node directors are using. But this will add lot of code in 
application.

Please update this ticket for a known or proposed solution. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2417 amf: support for si-swap in N+M model when Standbys are in different SUs.

2017-04-10 Thread Praveen




---

** [tickets:#2417] amf: support for si-swap in N+M model when Standbys are in 
different SUs.**

**Status:** accepted
**Milestone:** next
**Created:** Mon Apr 10, 2017 06:17 AM UTC by Praveen
**Last Updated:** Mon Apr 10, 2017 06:17 AM UTC
**Owner:** Praveen


This is continuation of ticket #2259
This new ticket will consider more general case:
"When SIs in designated SUs have their standby distributed in different SUs,"

Will update with an example configuration..



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-04-03 Thread Praveen

- **status**: accepted --> review



---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** review
**Milestone:** next
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Mon Apr 03, 2017 06:05 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
..
..
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
--

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another 
Admin operation already in progress: 4
~~~


Notes:
1. Syslog of Active controller attached
2. osafclmd of Active controller attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://so

[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.

2017-04-02 Thread Praveen

- **status**: review --> fixed
- **Comment**:

changeset:   213:388a98e1ce37
tag: tip
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Mon Apr 03 10:24:13 2017 +0530
summary: amf: AMF PR doc updates for #1190, #2259, #2144, #2065 and #2233 
[#2392].





---

** [tickets:#2392] amf: PR doc updates for 5.2 release.**

**Status:** fixed
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen
**Last Updated:** Tue Mar 28, 2017 09:53 AM UTC
**Owner:** Praveen


Updates to be done for:
-Enhancments: #1190, #2259, #2144, #2252
-Defect:2233


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-03-31 Thread Praveen

- **status**: assigned --> accepted
- **Milestone**: 5.0.2 --> next
- **Comment**:

Analysis:
There seems problem with atleast one CLM clinet which resides on PL-3. CLM is 
unable to send any message to this client.
1) CLMD creates this client as standby:
2) Mar 15 13:47:47.080883 osafclmd [2763:src/clm/clmd/clms_evt.c:0140] TR 
client_id: 63 lookup failed
Mar 15 13:47:47.080886 osafclmd [2763:src/clm/clmd/clms_evt.c:0250] >> 
clms_client_new: MDS dest 2030feebec01a
Mar 15 13:47:47.080888 osafclmd [2763:src/clm/clmd/clms_evt.c:0277] << 
clms_client_new: client_id 63

2) When user performs admin operation, CLMD tries to send track callback for 
complete step to this client but mds returns failure:
3) Mar 15 14:32:10.759655 osafclmd [2763:src/clm/clmd/clms_util.c:1095] TR 
Client ID 63 ,track_flags=3
Mar 15 14:32:10.759658 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
Mar 15 14:32:10.759661 osafclmd [2763:src/clm/clmd/clms_util.c:0352] >> 
clms_nodedb_lookup
Mar 15 14:32:10.759664 osafclmd [2763:src/clm/clmd/clms_util.c:0354] TR 
patricia tree size 4
Mar 15 14:32:10.759667 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131343
Mar 15 14:32:10.759670 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131599
Mar 15 14:32:10.759673 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131855
Mar 15 14:32:10.759676 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 132111
Mar 15 14:32:10.759687 osafclmd [2763:src/clm/clmd/clms_util.c:0375] TR 
num_nd_changes 4
Mar 15 14:32:10.759689 osafclmd [2763:src/clm/clmd/clms_util.c:0376] << 
clms_nodedb_lookup
Mar 15 14:32:10.759693 osafclmd [2763:src/clm/clmd/clms_mds.c:1494] >> 
clms_mds_msg_send
Mar 15 14:32:10.759728 osafclmd [2763:src/clm/clmd/clms_mds.c:1525] IN mds send 
returned: 2
Mar 15 14:32:10.759732 osafclmd [2763:src/clm/clmd/clms_mds.c:1527] << 
clms_mds_msg_send
Mar 15 14:32:10.759735 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed

4) Before admin operation on PL-3, this node was restarted. There is no 
evidence of this client going down in clmd traces.
5) When unlock operation was performed, CLMD again could not send membeship 
status to this client  and did not reply to IMM. Also admin op params are not 
reset.. Since admin operation params are not reset, no further admin operation 
are not allowed and getting timed out.







---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** accepted
**Milestone:** next
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 16, 2017 09:01 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.3740

[tickets] [opensaf:tickets] #2403 amf: cores were generated for amfd and amfnd on different controllers.

2017-03-31 Thread Praveen


log file is not NFS mounted.  But these core are generated on different nodes. 
For amfd it is on active controller and for amfnd on standby controller. I had 
run some tests by enabling AMFD and AMFND traces on a four nodes system.When 
this happend size of trace file was huge in GBs.
I will be rerunning theses tests probably next week. Since it is not 
reproducible, we can hold this ticket for next week.


---

** [tickets:#2403] amf: cores were generated for amfd and amfnd on different 
controllers.**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen
**Last Updated:** Fri Mar 31, 2017 06:09 AM UTC
**Owner:** nobody
**Attachments:**

- 
[amfd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads.log)
 (3.0 kB; application/octet-stream)
- 
[amfd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads_full.log)
 (22.1 kB; application/octet-stream)
- 
[amfnd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads.log)
 (2.9 kB; application/octet-stream)
- 
[amfnd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads_full.log)
 (23.0 kB; application/octet-stream)


Observed AMFD and AMFND crashes when calling TRACE() API. 

amfd:
\#0  0x7f44834db70d in write () from /lib64/libpthread.so.0
\#1  0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at 
src/amf/amfd/timer.cc:113
 \#4  0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/ndfsm.cc:1066
 \#5  0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/main.cc:792
\#6  0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693


amfnd:
(gdb) bt
\#0  0x7fdd0f3f270d in write () from /lib64/libpthread.so.0
\#1  0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646
\#4  0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2403 amf: cores were generated for amfd and amfnd on different controllers.

2017-03-30 Thread Praveen

- **summary**: amf: amfd and amfnd crashes while calling TRACE() API. --> amf: 
cores were generated for amfd and amfnd on different controllers.
- Description has changed:

Diff:



--- old
+++ new
@@ -1,4 +1,3 @@
-
 Observed AMFD and AMFND crashes when calling TRACE() API. 
 
 amfd:



- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1,4 @@
+amfd_bt_threads.log (3.0 kB; application/octet-stream)
+amfd_bt_threads_full.log (22.1 kB; application/octet-stream)
+amfnd_bt_threads.log (2.9 kB; application/octet-stream)
+amfnd_bt_threads_full.log (23.0 kB; application/octet-stream)



- **Comment**:

I mentioned in the the comment but forgot to change the title. It seems AMFD 
got and unresponsive, so amfnd generated its core after heartbeat 
timeout.Similarly on other controller amfnd got stuck and unresponsive, so 
watchdog generated its core. 
Attached is full bt in file.



---

** [tickets:#2403] amf: cores were generated for amfd and amfnd on different 
controllers.**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen
**Last Updated:** Thu Mar 30, 2017 01:23 PM UTC
**Owner:** nobody
**Attachments:**

- 
[amfd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads.log)
 (3.0 kB; application/octet-stream)
- 
[amfd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfd_bt_threads_full.log)
 (22.1 kB; application/octet-stream)
- 
[amfnd_bt_threads.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads.log)
 (2.9 kB; application/octet-stream)
- 
[amfnd_bt_threads_full.log](https://sourceforge.net/p/opensaf/tickets/2403/attachment/amfnd_bt_threads_full.log)
 (23.0 kB; application/octet-stream)


Observed AMFD and AMFND crashes when calling TRACE() API. 

amfd:
\#0  0x7f44834db70d in write () from /lib64/libpthread.so.0
\#1  0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at 
src/amf/amfd/timer.cc:113
 \#4  0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/ndfsm.cc:1066
 \#5  0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/main.cc:792
\#6  0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693


amfnd:
(gdb) bt
\#0  0x7fdd0f3f270d in write () from /lib64/libpthread.so.0
\#1  0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646
\#4  0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2403 amf: amfd and amfnd crashes while calling TRACE() API.

2017-03-30 Thread Praveen

AMFD and AMFND  traces around this time. In both of these cases cores are 
generated by AMFND and watchdog respectively. In both the cases timer was 
stopped and trading was done around that time.

For AMFD crash:
AMFD traces:
Mar 28 17:12:43.075338 osafamfd [6614:6614:src/mbc/mbcsv_act.c:0412] << 
ncs_mbscv_rcv_decode
Mar 28 17:12:43.075345 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0929] >> 
mbcsv_send_msg: event type: 12
Mar 28 17:12:43.075352 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0954] TR 
NCS_MBCSV_MSG_SYNC_SEND_RSP event
Mar 28 17:12:43.075399 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:1
Mar 28 17:12:43.075407 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0218] TR send 
type MDS_SENDTYPE_RRSP:
Mar 28 17:12:43.075576 osafamfd [6614:6614:src/mbc/mbcsv_mds.c:0244] << 
mbcsv_mds_send_msg: success
Mar 28 17:12:43.075599 osafamfd [6614:6614:src/mbc/mbcsv_util.c:0999] << 
mbcsv_send_msg
Mar 28 17:12:43.075606 osafamfd [6614:6614:src/mbc/mbcsv_act.c:0452] << 
ncs_mbcsv_rcv_async_update
Mar 28 17:12:43.075615 osafamfd [6614:6614:src/mbc/mbcsv_pr_evts.c:0222] << 
mbcsv_process_events
Mar 28 17:12:43.075625 osafamfd [6614:6614:src/mbc/mbcsv_pr_evts.c:0278] << 
mbcsv_hdl_dispatch_all
Mar 28 17:12:43.075633 osafamfd [6614:6614:src/mbc/mbcsv_api.c:0435] << 
mbcsv_process_dispatch_request: retval: 1
Mar 28 17:12:52.871798 osafamfd [6614:6616:src/mbc/mbcsv_tmr.c:0250] TR Timer 
expired. my role:2, svc_id:10, pwe_hdl:65537, peer_anchor:565213973364764, tmr 
type:NCS_MBCSV_TMR_SEND_WARM_SYNC
Mar 28 17:13:43.342772 osafamfd [6614:6616:src/amf/amfd/timer.cc:0154] >> 
avd_tmr_exp
Mar 28 17:13:43.342824 osafamfd [6614:6616:src/amf/amfd/timer.cc:0175] << 
avd_tmr_exp
Mar 28 17:13:43.348644 osafamfd [6614:6614:src/amf/amfd/main.cc:0774] >> 
process_event: evt->rcv_evt 14
Mar 28 17:13:43.348673 osafamfd [6614:6614:src/amf/amfd/ndfsm.cc:1058] >> 
avd_tmr_snd_hb_evh: seq_id=1212
Mar 28 17:13:43.349443 osafamfd [6614:6614:src/amf/amfd/timer.cc:0113] >> 
avd_stop_tmr: 0

messages:
Mar 28 17:13:43 PM_SC-1 osafamfnd[6629]: ER AMF director heart beat timeout, 
generating core for amfd
Mar 28 17:13:43 PM_SC-1 kernel: [17482.341638] ata1.00: device reported invalid 
CHS sector 0
Mar 28 17:13:43 PM_SC-1 osaffmd[6546]: NO AMFND down on: 2020f
Mar 28 17:13:43 PM_SC-1 kernel: [17482.341647] ata1: EH complete
Mar 28 17:13:44 PM_SC-1 osafamfnd[6629]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, 
SupervisionTime = 60
Mar 28 17:13:44 PM_SC-1 osaffmd[6546]: NO FM down on: 2020f


For amfnd:
amfnd:
Mar 28 17:12:27.530364 osafamfnd [6502:6502:src/amf/amfnd/cbq.cc:0242] >> 
avnd_evt_ava_resp_evh
Mar 28 17:12:27.530373 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:0509] TR 
safComp=AMFWDOG,safSu=SC-2,safSg=NoRed,safApp=OpenSAF: Type=15
Mar 28 17:12:27.530382 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:0612] >> 
avnd_int_ext_comp_val: safComp=AMFWDOG,safSu=SC-2,safSg=NoRed,safApp=OpenSAF
Mar 28 17:12:27.530390 osafamfnd [6502:6502:src/amf/amfnd/proxy.cc:] << 
avnd_int_ext_comp_val
Mar 28 17:12:27.530403 osafamfnd [6502:6502:src/amf/amfnd/tmr.cc:0126] TR 
callback response timer stopped
Mar 28 17:12:27.530412 osafamfnd [6502:6502:src/amf/amfnd/cbq.cc:0543] << 
avnd_evt_ava_resp_evh
Mar 28 17:12:27.530419 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0669] TR Evt 
Type:33 success
Mar 28 17:12:27.530427 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0674] << 
avnd_evt_process
Mar 28 17:12:33.631398 osafamfnd [6502:6502:src/amf/amfnd/main.cc:0646] >> 
avnd_evt_process

messages:
Mar 28 17:13:27 PM_SC-2 osafamfwd[6518]: TIMEOUT receiving AMF health check 
request, generating core for amfnd
Mar 28 17:13:35 PM_SC-2 kernel: [16964.152949] ata1.00: qc timeout (cmd 0xe7)
Mar 28 17:13:35 PM_SC-2 kernel: [16964.152994] ata1.00: FLUSH failed Emask 0x4
Mar 28 17:13:35 PM_SC-2 kernel: [16964.153005] ata1: hard resetting link
Mar 28 17:13:35 PM_SC-2 kernel: [16964.472367] ata1: SATA link up 3.0 Gbps 
(SStatus 123 SControl 300)
Mar 28 17:13:35 PM_SC-2 kernel: [16964.473051] ata1.00: configured for UDMA/133
Mar 28 17:13:35 PM_SC-2 kernel: [16964.473055] ata1.00: retrying FLUSH 0xe7 
Emask 0x4
Mar 28 17:13:43 PM_SC-2 kernel: [16972.985932] ata1.00: device reported invalid 
CHS sector 0
Mar 28 17:13:44 PM_SC-2 osafamfwd[6518]: Last received healthcheck cnt=1208 at 
Tue Mar 28 17:12:27 2017
Mar 28 17:13:44 PM_SC-2 osafamfwd[6518]: Rebooting OpenSAF NodeId = 0 EE Name = 
No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 131599, 
SupervisionTime = 60
Mar 28 17:13:44 PM_SC-2 osafclmd[6482]: AL AMF Node Director is down, terminate 
this process



---

** [tickets:#2403] amf: amfd and amfnd crashes while calling TRACE() API.**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen
**Last Updated:

[tickets] [opensaf:tickets] #2403 amf: amfd and amfnd crashes while calling TRACE() API.

2017-03-30 Thread Praveen




---

** [tickets:#2403] amf: amfd and amfnd crashes while calling TRACE() API.**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 30, 2017 08:39 AM UTC by Praveen
**Last Updated:** Thu Mar 30, 2017 08:39 AM UTC
**Owner:** nobody



Observed AMFD and AMFND crashes when calling TRACE() API. 

amfd:
\#0  0x7f44834db70d in write () from /lib64/libpthread.so.0
\#1  0x7f4483eb9af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7f4484dbc714 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7f4484e7b3cb in avd_stop_tmr(cl_cb_tag*, avd_tmr_tag*) () at 
src/amf/amfd/timer.cc:113
 \#4  0x7f4484e03556 in\ avd_tmr_snd_hb_evh(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/ndfsm.cc:1066
 \#5  0x7f4484e005b4 in process_event(cl_cb_tag*, AVD_EVT*) () at 
src/amf/amfd/main.cc:792
\#6  0x7f4484db9a1e in main () at src/amf/amfd/main.cc:693


amfnd:
(gdb) bt
\#0  0x7fdd0f3f270d in write () from /lib64/libpthread.so.0
\#1  0x7fdd0fb57af9 in output_ () from /usr/local/lib/libopensaf_core.so.0
\#2  0x7fdd10844a64 in Trace::trace(char const*, char const*, unsigned int, 
unsigned int, char const*, ...) ()
at ./src/base/logtrace.h:166
\#3  0x7fdd1086e695 in avnd_main_process() () at src/amf/amfnd/main.cc:646
\#4  0x7fdd1084342f in main () at src/amf/amfnd/main.cc:207




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch

2017-03-29 Thread Praveen

- **Milestone**: 5.2.RC2 --> next



---

** [tickets:#2100]  Standby should not be rebooted, for  SC absence 
configuration mismatch**

**Status:** unassigned
**Milestone:** next
**Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 05:33 AM UTC
**Owner:** nobody


Changeset : 8190 5.1.GA

-> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in 
immd.conf.

-> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is 
started on SC-2, for which node rebooted.

Oct  7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the 
same as on active IMMD. Active: 900, Standby: 0. Exiting.
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60

   Here  user had misconfigured the configuration on both the controllers, for 
which standby rebooted. Opensafd is enabled in runlevel as part of installation 
and standby shall reboot continuously until opensafd is stopped on SC-1.
   
  Suggested behavior :
   
   Opensafd should not start on standby, instead of immediate reboot. 
   
   Also, the cluster level  attributes like IMMSV_SC_ABSENCE_ALLOWED,  can be 
moved to imm.xml. Node level attributes like traces enabling can be retained in 
configuration files.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.

2017-03-28 Thread Praveen

- **status**: accepted --> review



---

** [tickets:#2392] amf: PR doc updates for 5.2 release.**

**Status:** review
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen
**Last Updated:** Tue Mar 28, 2017 09:52 AM UTC
**Owner:** Praveen


Updates to be done for:
-Enhancments: #1190, #2259, #2144, #2252
-Defect:2233


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2392 amf: PR doc updates for 5.2 release.

2017-03-28 Thread Praveen

AMF PR doc for review for #1190, #2259, #2144, #2065 and #2233.


Attachments:

- 
[OpenSAF_AMF_PR_5.2.odt](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1d1e1df6/f63c/attachment/OpenSAF_AMF_PR_5.2.odt)
 (133.4 kB; application/vnd.oasis.opendocument.text)


---

** [tickets:#2392] amf: PR doc updates for 5.2 release.**

**Status:** accepted
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:36 AM UTC by Praveen
**Last Updated:** Thu Mar 23, 2017 05:36 AM UTC
**Owner:** Praveen


Updates to be done for:
-Enhancments: #1190, #2259, #2144, #2252
-Defect:2233


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.

2017-03-28 Thread Praveen

- **status**: review --> assigned
- **Milestone**: 5.0.2 --> next



---

** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way 
Active model.**

**Status:** assigned
**Milestone:** next
**Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen
**Last Updated:** Fri Mar 10, 2017 10:44 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in 
N-Way Active model.
Issue can be reproduced by brining up the attached configurration.
In the application saAmfSGNumPrefAssignedSUs is set to 2:
 immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass
saAmfSGNumPrefAssignedSUs  SA_UINT32_T  2 (0x2)

But AMF is giving assignmets to all the three SUs:
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)

Since this attribute is valid for N-Way model also, issue is applicable to 
N-Way model also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #316 SI Assignments are not removed for a SU in Nway redundancy model

2017-03-28 Thread Praveen

- **status**: review --> assigned
- **Milestone**: 5.0.2 --> next



---

** [tickets:#316] SI Assignments are not removed for a SU in Nway redundancy 
model**

**Status:** assigned
**Milestone:** next
**Created:** Fri May 24, 2013 08:39 AM UTC by Nagendra Kumar
**Last Updated:** Thu Jan 05, 2017 06:31 AM UTC
**Owner:** Praveen
**Attachments:**

- [logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/logs.tar) 
(2.5 MB; application/x-gzip-compressed)
- [osafamfd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfd) 
(228.2 kB; application/octet-stream)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfnd) 
(122.8 kB; application/octet-stream)
- 
[pl_logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/pl_logs.tar)
 (1.3 MB; application/x-gzip-compressed)


Migrated from http://devel.opensaf.org/ticket/2987

changeset : 3855
Model : NWay
configuration : 1App,1SG,5SU with 3comps each, 5SIs with 3csi each.
si-si deps configured as SI1<-SI2<-SI3<-SI4
SIrankedSus not configured. 
Node mapping : SU1 on SC-1, SU2 on SC-2, SU3 on PL-3, SU4,SU5 on PL-4.


While running the campaign, smf performs lock,lock-in of the activation units 
i.e SUs. The SIs for SU3 are not removed though SU3 is in locked-state. 
Subsequent unlock-in,unlock of SU3 fails. 


/var/log/messages of active ctrl- SC-1 shows

Feb 3 22:45:14 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:16 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:18 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:20 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:23 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Fail to invoke admin operation, 
too many SA_AIS_ERR_TRY_AGAIN, giving up. 
dn=[safSu=SU3,safSg=SGONE,safApp=NWAYAPP], opId=[3]
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to call admin operation 3 
on safSu=SU3,safSg=SGONE,safApp=NWAYAPP
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to Terminate activation 
units in step=safSmfStep=0003
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step undoing failed
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step safSmfStep=0003 in procedure 
safSmfProc=amfClusterProc-1 failed, step result 5
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: NO CAMP: Procedure 
safSmfProc=amfClusterProc-1 returned FAILED


SU Assignments brief:
===
safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI3,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI2,safApp=NWAYAPP


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=SU3\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP


saAmfSISUHAState=QUIESCED(3)


safSISU=safSu=SU4\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SU2\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI1,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


SU States:
==
safSu=SU3,safSg=SGONE,safApp=NWAYAPP


saAmfSUAdminState=LOCKED(2)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)


changed 4 months ago by bertil ¶
  ■owner changed from ingber to ravisekhar 
■component changed from saf/smfsv to saf/avsv 
I beleave this is an AMF problem. SMF only uses the AMF admin ops (lock, unlock 
etc).






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2372 amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-28 Thread Praveen

- **status**: review --> fixed
- **Comment**:

changeset:   8727:9a1452dcd190
branch:  opensaf-5.0.x
parent:  8721:b2e2a9162664
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Tue Mar 28 12:19:02 2017 +0530
summary: amf: fix track callback when multiple CLM nodes leaves 
membership[#2372].

changeset:   8728:bdd9cdb1ced9
branch:  opensaf-5.1.x
parent:  8722:9c295151f262
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Tue Mar 28 12:19:40 2017 +0530
summary: amf: fix track callback when multiple CLM nodes leaves 
membership[#2372].

changeset:   8729:a8fa805d5765
tag: tip
parent:  8726:cad103f14b48
user:Praveen Malviya <praveen.malv...@oracle.com>
date:Tue Mar 28 12:20:28 2017 +0530
summary: amf: fix track callback when multiple CLM nodes leaves 
membership[#2372].

[staging:9a1452]
[staging:bdd9cd]
[staging:a8fa80]




---

** [tickets:#2372] amf: CLM lock of two more nodes returns REPAIR_PENDING for 
first node.**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Mon Mar 27, 2017 09:59 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster

2017-03-27 Thread Praveen

- **status**: assigned --> duplicate
- **Milestone**: 5.2.RC2 --> future



---

** [tickets:#2371] AMF: NPM app went into unstable state while expanding 
cluster**

**Status:** duplicate
**Milestone:** future
**Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava
**Last Updated:** Fri Mar 17, 2017 08:29 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) 
(86.9 kB; application/octet-stream)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) 
(13.0 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 8603( 5.2.MO-1)
4 node cluster without PBE

Summary - Application went into unstable state and campaign execution could not 
complete while expanding the cluster using campaign

Steps:
1. Brought up an NPM application with 5 SUs
2. Using campaign add a 3rd payload PL-5 to the cluster

App went into bad state

Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:27 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:29 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2320 clm: standby clmd crashes due to missing node information

2017-03-27 Thread Praveen

- **status**: assigned --> duplicate
- **Part**: - --> d
- **Milestone**: 5.2.RC2 --> 5.0.2
- **Comment**:


Problem is fixed in #2325. Please raise a new ticket with traces and logs if 
still observed.




---

** [tickets:#2320] clm: standby clmd crashes due to missing node information**

**Status:** duplicate
**Milestone:** 5.0.2
**Created:** Wed Feb 22, 2017 12:55 PM UTC by Zoran Milinkovic
**Last Updated:** Mon Mar 27, 2017 10:59 AM UTC
**Owner:** Praveen


The standby CLMD service crashed due to missing PL-3 information.

syslog from SC-2:
~~~
Feb 13 00:43:31 SC-2-2 osafamfd[5082]: NO Cold sync complete!
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: Ruling epoch noted as:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: SaImmRepositoryInitModeT changed 
and noted as 'SA_IMM_KEEP_REPOSITORY'
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19082
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO Epoch set to 5 in ImmModel
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2020f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2010f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2030f old epoch: 0  new epoch:5
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ER Node is NULL,problem with the 
database.
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: 
../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' 
failed.
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Feb 13 00:43:32 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60
~~~

Coredump:
~~~
[New LWP 5066]
[New LWP 5069]
[New LWP 5068]
[New LWP 5070]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafclmd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
### BT ###
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
#1  0x7fbc7be89478 in abort () from /lib64/libc.so.6
#2  0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
#3  0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, 
data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468
#4  0x7fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=, 
cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310
#5  ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:1997
#6  mbcsv_callback (arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:719
#7  0x7fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, 
evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393
#8  0x7fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, 
evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440
#9  0x7fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, 
mbcsv_hdl=mbcsv_hdl@entry=4293918753) at 
../../opensaf/src/mbc/mbcsv_pr_evts.c:168
#10 0x7fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, 
mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272
#11 0x7fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) 
at ../../opensaf/src/mbc/mbcsv_api.c:423
#12 0x7fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:687
#13 0x7fbc7e19e4e4 in main (argc=, argv=) at 
../../opensaf/src/clm/clmd/clms_main.c:535
### BT FULL ###
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7fbc7be89478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0"

[tickets] [opensaf:tickets] #2320 clm: standby clmd crashes due to missing node information

2017-03-27 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2320] clm: standby clmd crashes due to missing node information**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Wed Feb 22, 2017 12:55 PM UTC by Zoran Milinkovic
**Last Updated:** Fri Feb 24, 2017 10:06 AM UTC
**Owner:** Praveen


The standby CLMD service crashed due to missing PL-3 information.

syslog from SC-2:
~~~
Feb 13 00:43:31 SC-2-2 osafamfd[5082]: NO Cold sync complete!
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: Ruling epoch noted as:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: SaImmRepositoryInitModeT changed 
and noted as 'SA_IMM_KEEP_REPOSITORY'
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19082
Feb 13 00:43:31 SC-2-2 osafimmnd[5024]: NO Epoch set to 5 in ImmModel
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2020f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2010f old epoch: 4  new epoch:5
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO IMMND coord at 2010f
Feb 13 00:43:31 SC-2-2 osafimmd[5009]: NO SBY: New Epoch for IMMND process at 
node 2030f old epoch: 0  new epoch:5
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: ER Node is NULL,problem with the 
database.
Feb 13 00:43:31 SC-2-2 osafclmd[5066]: 
../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' 
failed.
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Feb 13 00:43:32 SC-2-2 osafamfnd[5096]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Feb 13 00:43:32 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60
~~~

Coredump:
~~~
[New LWP 5066]
[New LWP 5069]
[New LWP 5068]
[New LWP 5070]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafclmd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
### BT ###
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
#1  0x7fbc7be89478 in abort () from /lib64/libc.so.6
#2  0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
#3  0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, 
data=0x7fbc7f218a50) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:468
#4  0x7fbc7e1ae044 in ckpt_decode_async_update (cbk_arg=, 
cb=0x7fbc7e3be100 <_clms_cb>) at ../../opensaf/src/clm/clmd/clms_mbcsv.c:2310
#5  ckpt_decode_cbk_handler (cbk_arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:1997
#6  mbcsv_callback (arg=0x7fff27b6b1a0) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:719
#7  0x7fbc7c856f76 in ncs_mbscv_rcv_decode (peer=peer@entry=0x7fbc7f217a60, 
evt=evt@entry=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:393
#8  0x7fbc7c857146 in ncs_mbcsv_rcv_async_update (peer=0x7fbc7f217a60, 
evt=0x7fbc740036e0) at ../../opensaf/src/mbc/mbcsv_act.c:440
#9  0x7fbc7c85dd30 in mbcsv_process_events (rcvd_evt=0x7fbc740036e0, 
mbcsv_hdl=mbcsv_hdl@entry=4293918753) at 
../../opensaf/src/mbc/mbcsv_pr_evts.c:168
#10 0x7fbc7c85de9b in mbcsv_hdl_dispatch_all (mbcsv_hdl=4293918753, 
mbx=mbx@entry=4283432961) at ../../opensaf/src/mbc/mbcsv_pr_evts.c:272
#11 0x7fbc7c8586c2 in mbcsv_process_dispatch_request (arg=0x7fff27b6b310) 
at ../../opensaf/src/mbc/mbcsv_api.c:423
#12 0x7fbc7e1aa7be in clms_mbcsv_dispatch (mbcsv_hdl=) at 
../../opensaf/src/clm/clmd/clms_mbcsv.c:687
#13 0x7fbc7e19e4e4 in main (argc=, argv=) at 
../../opensaf/src/clm/clmd/clms_main.c:535
### BT FULL ###
#0  0x7fbc7be880c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7fbc7be89478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7fbc7c85202e in __osafassert_fail (__file=__file@entry=0x7fbc7e1b7d50 
"../../opensaf/src/clm/clmd/clms_mbcsv.c", __line=__line@entry=468, 
__func=__func@entry=0x7fbc7e1b8820 <__FUNCTION__.12739> "ckpt_proc_node_rec", 
__assertion=__assertion@entry=0x7fbc7e1b78ea "0") at 
../../opensaf/src/base/sysf_def.c:281
No locals.
#3  0x7fbc7e1aa016 in ckpt_proc_node_rec (cb=, 
data=0x7fbc7f218a50) at ../../opensaf

[tickets] [opensaf:tickets] #2372 amf: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-27 Thread Praveen

- **summary**: amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for 
first node. --> amf: CLM lock of two more nodes returns REPAIR_PENDING for 
first node.



---

** [tickets:#2372] amf: CLM lock of two more nodes returns REPAIR_PENDING for 
first node.**

**Status:** review
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Mon Mar 27, 2017 09:59 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-27 Thread Praveen

- **status**: accepted --> review



---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** review
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Thu Mar 23, 2017 09:16 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-23 Thread Praveen

Hi,

I think there is no problem from CLM perspective. I have checked in both of the 
cases above, initialViewNumber are passed correctly at all stages and an 
application always distingiushes based on the passed initialveiwnumber.
So the fix is needed in AMF.
I will sent out a patch.

Thanks,
Praveen


---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Thu Mar 16, 2017 07:08 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-03-23 Thread Praveen

- **assigned_to**: Praveen -->  nobody 



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** unassigned
**Milestone:** next
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Thu Mar 23, 2017 05:19 AM UTC
**Owner:** nobody


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.

2017-03-23 Thread Praveen

- **status**: review --> fixed
- **Milestone**: 5.2.RC2 --> 5.0.2
- **Comment**:

changeset:   8718:8d305dff2257
branch:  opensaf-5.0.x
parent:  8715:dae6b6197639
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 12:34:04 2017 +0530
summary: amfd: remove assignments from lower ranked SU while adjusting SI 
assignments [#2268]

changeset:   8719:263af6bf5c65
branch:  opensaf-5.1.x
parent:  8716:8d149783d95a
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 12:35:03 2017 +0530
summary: amfd: remove assignments from lower ranked SU while adjusting SI 
assignments [#2268]

changeset:   8720:057a8a4b1a99
tag: tip
parent:  8717:6cffd8965ae4
user:Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 12:36:49 2017 +0530
summary: amfd: remove assignments from lower ranked SU while adjusting SI 
assignments [#2268]

[staging:8d305d]
[staging:263af6]
[staging:057a8a]




---

** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way 
Active model.**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen
**Last Updated:** Fri Mar 17, 2017 09:24 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from 
higher ranked SU when siranked su is not configured.
Steps to reproduce:
1) Bring attached application up on one controller.
2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref 
active assignments for SI is 3.
3) Reduce pref active assignment for the SI by running following command:
   immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active
4)Since pref active assignments is reduced by 1, AMFD sends quiesced and 
removal of assignment to SU2.
5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3.


Assignments before reducing pref active assignmets:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

Assignments after reducing pre active assignments:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.

2017-03-23 Thread Praveen




---

** [tickets:#2394] clm: add clm tool commands for admin op and state check.**

**Status:** accepted
**Milestone:** next
**Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen
**Last Updated:** Thu Mar 23, 2017 06:17 AM UTC
**Owner:** Praveen


Intention is to add clm tool comamnds:
-to perform admin operation on node or on cluster. Something like 
clm-adm <lock|shutdown|unlock|reset> 

-to check CLM nodes admin state and member ship status: like
clm-state  <membership|adminstate>
-to find CLM cluster and nodes like:
clm-find  <memebers|non-member>


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2387 amf: choose CLM unlocked spare controller for standby role in failover situation

2017-03-23 Thread Praveen

- **status**: review --> fixed
- **Comment**:

https://sourceforge.net/p/opensaf/mailman/message/35738800/


changeset:   8712:a3ba6212ecf6
branch:  opensaf-5.1.x
parent:  8707:4e47c66382f3
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 11:34:31 2017 +0530
summary: amfd: choose CLM unlocked spare controller for standby role in 
failover situation[#2387]

changeset:   8713:3a718e40acec
branch:  opensaf-5.0.x
parent:  8708:9073359c83b4
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 11:35:07 2017 +0530
summary: amfd: choose CLM unlocked spare controller for standby role in 
failover situation[#2387]

changeset:   8714:ffb6233abe8b
tag: tip
parent:  8711:262d1f2132ca
user:Praveen Malviya <praveen.malv...@oracle.com>
date:Thu Mar 23 11:36:00 2017 +0530
summary: amfd: choose CLM unlocked spare controller for standby role in 
failover situation[#2387]




---

** [tickets:#2387] amf: choose CLM unlocked spare controller for standby role 
in failover situation**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Fri Mar 17, 2017 12:13 PM UTC by Ritu Raj
**Last Updated:** Tue Mar 21, 2017 09:52 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-1.tar.bz2)
 (873.4 kB; application/x-bzip)
- 
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-2.tar.bz2)
 (762.0 kB; application/x-bzip)
- 
[SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-3.tar.bz2)
 (724.5 kB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
6 nodes setup(3 controller and 3 payload,  with SC_ABSENCE enabled)

###Summary
choose CLM unlocked spare controller for standby role in failover situation

###Steps followed & Observed behaviour
1. Initially SC-1 (ACTIVE), SC-2 (QUIESCED) , SC-3 (STANDBY) role
2. Performed clm_lock operation on SC-2(QUIESCED) controller
3. after, that perfomed on failover on Active controller (SC-1), by killing one 
director
4.  Observed that SC-3 got Active  role  while SC-2 got Standby role, which is 
not expcted as node SC-2 is in clm_locked state 
5.  Later, SC-1 joined  as QUIESCED controller (after recovery from failover)

**Expected**:
clm_lock node should not get standby role as it is in locked state and SC-1 
should join as a  Standby after recovery from failover.
   
 Syslog:
Mar 17 17:56:59 suseR2-S2 osafimmnd[21809]: NO Implementer (applier) connected: 
28 (@safSmf_applier1) <0, 2030f>
Mar 17 17:56:59 suseR2-S2 osafamfnd[21859]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO RDE role set to STANDBY
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Peer up on node 0x2030f
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info request from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info response from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:3, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:566317113647120)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:565213543063568)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN AMF HA STANDBY request
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
566317113647120
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
565213543063568
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed


 From Traces:
 
 SC-2 left the cluster as clm lock operation performed and later SC-1 left the 
cluster as one failover performed:
 
~~~
SC-2:::
 Mar 17 17:54:24.123134 osafamfnd [6773:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:54:24.123142 osafamfnd [6773:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-2,safCluster=myClmCluster', avnd_cb->first_time_up 
0,notifItem->clusterNode.nodeId 131599, avnd_cb->node_info.nodeId 131343
-
-
SC-1:::
 Mar 17 17:57:03.514477 osafamfnd [9266:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:57:03.514484 osafamfnd [9266:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-1,safCluster=myClmCluster', avnd_cb->first_t

[tickets] [opensaf:tickets] #2331 CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily are not exposed to IMM even that TPC mode is using

2017-03-22 Thread Praveen

- **status**: assigned --> unassigned
- **Type**: defect --> enhancement
- **Milestone**: 5.0.2 --> next
- **Comment**:

As per CLM PR doc,  section "3.2.2  Compliance Report", 
saClmNodeCurrAddressFamily and saClmNodeCurrAddress are not supported. So 
converting this ticket into enhancement. I may plan it for next release.



---

** [tickets:#2331] CLM: : saClmNodeCurrAddress and saClmNodeCurrAddressFamily 
are not exposed to IMM even that TPC mode is using**

**Status:** unassigned
**Milestone:** next
**Created:** Thu Mar 02, 2017 10:10 AM UTC by Tai Dinh
**Last Updated:** Tue Mar 07, 2017 11:54 AM UTC
**Owner:** Praveen


saClmNodeCurrAddress and saClmNodeCurrAddressFamily of cluster node is not 
exposed to IMM even that TCP mode is configured.
This kind of information is sometimes needed by application.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2387 clm_locked spare controller got standby role after failover

2017-03-19 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d
- **Milestone**: 5.2.RC2 --> 5.0.2



---

** [tickets:#2387] clm_locked spare controller got standby role after failover**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Fri Mar 17, 2017 12:13 PM UTC by Ritu Raj
**Last Updated:** Fri Mar 17, 2017 12:13 PM UTC
**Owner:** Praveen
**Attachments:**

- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-1.tar.bz2)
 (873.4 kB; application/x-bzip)
- 
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-2.tar.bz2)
 (762.0 kB; application/x-bzip)
- 
[SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-3.tar.bz2)
 (724.5 kB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
6 nodes setup(3 controller and 3 payload,  with SC_ABSENCE enabled)

###Summary
clm_locked spare controller got standby role after failover

###Steps followed & Observed behaviour
1. Initially SC-1 (ACTIVE), SC-2 (QUIESCED) , SC-3 (STANDBY) role
2. Performed clm_lock operation on SC-2(QUIESCED) controller
3. after, that perfomed on failover on Active controller (SC-1), by killing one 
director
4.  Observed that SC-3 got Active  role  while SC-2 got Standby role, which is 
not expcted as node SC-2 is in clm_locked state 
5.  Later, SC-1 joined  as QUIESCED controller (after recovery from failover)

**Expected**:
clm_lock node should not get standby role as it is in locked state and SC-1 
should join as a  Standby after recovery from failover.
   
 Syslog:
Mar 17 17:56:59 suseR2-S2 osafimmnd[21809]: NO Implementer (applier) connected: 
28 (@safSmf_applier1) <0, 2030f>
Mar 17 17:56:59 suseR2-S2 osafamfnd[21859]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO RDE role set to STANDBY
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Peer up on node 0x2030f
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info request from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info response from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:3, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:566317113647120)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:565213543063568)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN AMF HA STANDBY request
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
566317113647120
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
565213543063568
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed


 From Traces:
 
 SC-2 left the cluster as clm lock operation performed and later SC-1 left the 
cluster as one failover performed:
 
~~~
SC-2:::
 Mar 17 17:54:24.123134 osafamfnd [6773:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:54:24.123142 osafamfnd [6773:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-2,safCluster=myClmCluster', avnd_cb->first_time_up 
0,notifItem->clusterNode.nodeId 131599, avnd_cb->node_info.nodeId 131343
-
-
SC-1:::
 Mar 17 17:57:03.514477 osafamfnd [9266:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:57:03.514484 osafamfnd [9266:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-1,safCluster=myClmCluster', avnd_cb->first_time_up 
0,notifItem->clusterNode.nodeId 131343, avnd_cb->node_info.nodeId 131855
~~~

 after failover SC-2 got standby role and SC-3 Active :
~~~
SC::2
 Mar 17 17:56:59.941081 osafamfnd [21859:src/amf/amfnd/susm.cc:1043] NO 
Assigned 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 17 17:56:59.941089 osafamfnd [21859:src/amf/amfnd/err.cc:1639] >> 
is_no_assignment_due_to_escalations
Mar 17 17:56:59.941097 osafamfnd [21859:src/amf/amfnd/err.cc:1651] << 
is_no_assignment_due_to_escalations: false
Mar 17 17:56:59.941104 osafamfnd [21859:src/amf/amfnd/di.cc:0829] >> 
avnd_di_susi_resp_send: Sending Resp su=safSu=SC-2,safSg=2N,safApp=OpenSAF, 
si=safSi=SC-2N,safApp=OpenSAF, curr_state=2, prv_state=0
Mar 17 17:56:59.941112 osafamfnd [21859:src/amf/amfnd/di.cc:0839] TR 
curr_assign_state '3



SC:::3
Mar 17 17:57:03.656105 osafamfnd [9266:src/amf/a

[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.

2017-03-17 Thread Praveen

- **status**: assigned --> review
- **Milestone**: 5.0.2 --> 5.2.RC2



---

** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way 
Active model.**

**Status:** review
**Milestone:** 5.2.RC2
**Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen
**Last Updated:** Wed Mar 08, 2017 06:24 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from 
higher ranked SU when siranked su is not configured.
Steps to reproduce:
1) Bring attached application up on one controller.
2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref 
active assignments for SI is 3.
3) Reduce pref active assignment for the SI by running following command:
   immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active
4)Since pref active assignments is reduced by 1, AMFD sends quiesced and 
removal of assignment to SU2.
5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3.


Assignments before reducing pref active assignmets:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

Assignments after reducing pre active assignments:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster

2017-03-17 Thread Praveen

SI dep is configured among the SIs assigned in same SU. This ticket must be 
duplicate of 
\#92 AVSv: In NPM per SI level role failover needs to be implemented when SI-SI 
dependency within SU is configured.

Analysis:
1) NG was locked and AMF sent quiesced assignment to the SU :
 Mar 17  4:37:51.067937 osafamfd [2562:src/amf/amfd/nodegroup.cc:1072] >> 
ng_admin_op_cb: 'safAmfNodeGroup=smfLockAdmNg13,safAmfCluster=myAmfCluster', 
inv:'936302870542', op:'2'
 
Mar 17  4:37:51.070545 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:4454] >> 
ng_admin: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP', sg_fsm_state:0
Mar 17  4:37:51.070553 osafamfd [2562:src/amf/amfd/sgproc.cc:2319] >> 
avd_sg_su_si_mod_snd: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP', state 3
Mar 17  4:37:51.070560 osafamfd

2)When response for quiesced state comes, AMFD tries to failover the SU and 
could not failover it as both sponsor and dependent are in same SU , so it 
sends deletion of assignment to the SU :
Mar 17  4:37:51.229455 osafamfd [2562:src/amf/amfd/sgproc.cc:1104] >> 
avd_su_si_assign_evh: id:101, node:2010f, act:5, 
'safSu=SU1,safSg=SGONE,safApp=NPMAPP', '', ha:3, err:1, single:0
Mar 17  4:37:51.230494 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0162] >> 
avd_sg_npm_su_chk_snd
Mar 17  4:37:51.230507 osafamfd [2562:src/amf/amfd/si_dep.cc:1730] >> 
avd_sidep_is_su_failover_possible: SU:'safSu=SU1,safSg=SGONE,safApp=NPMAPP' 
node_state:2
Mar 17  4:37:51.230515 osafamfd [2562:src/amf/amfd/si_dep.cc:1734] TR 
:susi:safSi=NPMSI2,safApp=NPMAPP si_dep_state:3 state:3 fsm:3
Mar 17  4:37:51.230522 osafamfd [2562:src/amf/amfd/si_dep.cc:1573] >> 
avd_sidep_is_si_failover_possible: SI: 'safSi=NPMSI2,safApp=NPMAPP', SU 
safSu=SU1,safSg=SGONE,safApp=NPMAPP
Mar 17  4:37:51.230530 osafamfd [2562:src/amf/amfd/si_dep.cc:1712] << 
avd_sidep_is_si_failover_possible: return value: 0
Mar 17  4:37:51.230536 osafamfd [2562:src/amf/amfd/si_dep.cc:1745] TR Role 
failover is deferred as sponsors role failover is under going
Mar 17  4:37:51.230543 osafamfd [2562:src/amf/amfd/si_dep.cc:0205] TR 
'safSi=NPMSI2,safApp=NPMAPP' si_dep_state ASSIGNED => FAILOVER_UNDER_PROGRESS
Mar 17  4:37:51.230588 osafamfd [2562:src/amf/amfd/chkop.cc:0229] TR Async 
update
Mar 17  4:37:51.230757 osafamfd [2562:src/amf/amfd/si_dep.cc:1752] << 
avd_sidep_is_su_failover_possible: return value: 0
Mar 17  4:37:51.230764 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0169] TR role 
modification cannot be done now as Sponsor SI's are not yet assigned
Mar 17  4:37:51.230771 osafamfd [2562:src/amf/amfd/sg_npm_fsm.cc:0208] << 
avd_sg_npm_su_chk_snd: return value :2
Mar 17  4:37:51.230778 osafamfd [2562:src/amf/amfd/sgproc.cc:2434] >> 
avd_sg_su_si_del_snd: 'safSu=SU1,safSg=SGONE,safApp=NPMAPP'
Mar 17  4:37:51.230795 osafamfd [2562:src/amf/amfd/su.cc:2462] >> 
any_susi_fsm_in: SU:'safSu=SU1,safSg=SGONE,safApp=NPMAPP', check_fsm:1
Mar 17  4:37:51.230803 osafamfd [2562:src/amf/amfd/su.cc:2467] TR 
SUSI:'safSu=SU1,safSg=SGONE,safApp=NPMAPP,safSi=NPMSI1,safApp=NPMAPP', fsm:'3'
Mar 17  4:37:51.230809 osafamfd [2562:src/amf/amfd/su.cc:2467] TR 
SUSI:'safSu=SU1,safSg=SGONE,safApp=NPMAPP,safSi=NPMSI2,safApp=NPMAPP', fsm:'3'

3)After deletion of assignment, AMF again tries to failover the assignments but 
fails for the same reason as above.





---

** [tickets:#2371] AMF: NPM app went into unstable state while expanding 
cluster**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava
**Last Updated:** Wed Mar 15, 2017 05:37 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) 
(86.9 kB; application/octet-stream)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) 
(13.0 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 8603( 5.2.MO-1)
4 node cluster without PBE

Summary - Application went into unstable state and campaign execution could not 
complete while expanding the cluster using campaign

Steps:
1. Brought up an NPM application with 5 SUs
2. Using campaign add a 3rd payload PL-5 to the cluster

App went into bad state

Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:27 NewSC1 osafamfd

[tickets] [opensaf:tickets] #2369 Java: consolidated clm java issue

2017-03-16 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> lib
- **Milestone**: 5.2.RC2 --> 5.0.2



---

** [tickets:#2369] Java: consolidated clm java issue**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Mon Mar 13, 2017 10:46 AM UTC by Ritu Raj
**Last Updated:** Mon Mar 13, 2017 10:46 AM UTC
**Owner:** Praveen


###Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
4 nodes setup(2 controller and 2 payload)

###Summary
consolidated clm java issue

###Steps followed & Observed behaviour

JAVA_CLM issues:

(A).
1. Call clm Initialization  
2. Call DispatchBlocking  in one thread  
2. Invoked Finalize in main thread observed that dispatch thread failed to exit
3. thread should not wait, once finalized and the dispatch thread created 
should exited

(B).
1. Call clm Initialize > Finalize
2. Call getClusterMembershipManager  with already Finalize handle, it should 
return bad handle exception, but proper handle is returned.

(C).
1. Initialize  version ['B', 1, 0] - Minor version less than supported minor 
version
2. It is returning Incompatible version parameter, instead of expected 
SA_AIS_OK.

(D).
1. Initialize  version ['B', 1, 8] - Minor version greater than supported minor 
version
2. It is returning Incompatible version parameter, instead of expected 
SA_AIS_OK.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-03-16 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d
- **Milestone**: 5.2.RC2 --> 5.0.2



---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 16, 2017 08:33 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
..
..
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
--

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another 
Admin operation already in progress: 4
~~~


Notes:
1. Syslog of Active controller attached
2. osafclmd of Active controller attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a

[tickets] [opensaf:tickets] #2381 clmd: clm admin operation returns BAD_OP after rebooting node

2017-03-16 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d
- **Milestone**: 5.2.RC2 --> 5.0.2



---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 16, 2017 07:30 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
> init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_NOT_SUPPORTED' and 'SA_AIS_ERR_BAD_OPERATION'

> SLES-SLOT1:~ # amf-adm lock safNode=SC-1,safCluster=myClmCluster
> Mar 15 14:50:47 SLES-SLOT1 osafclmd[2763]: NO Lock on active node not allowed
> Mar 15 14:50:47 SLES-SLOT1 osafclmd[2763]: NO clms_imm_node_lock failed
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_NOT_SUPPORTED (19)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
..
..
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
--

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another 
Admin operation already in progress: 4
~~~


Notes:
1. Syslog of Active controller attached
2. osafclmd of Activ

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-16 Thread Praveen

Hi,

Srikanth: Thanks for the information.

I have analyzed the situation. The two issues are same (one case AMF
application comps are running on locked payloads). The message " NO Pending
Response sent for CLM track callback::OK '7'" is because of AMF responding two
times for same invocationid. For the case mentioned in ticket description this
message is not observed because applications installed on locked nodes makes
the difference. CLMS properly maintains invocationid for all clients per
callback. So to understand the problem I considered a diferent case.

Suppose one payload node PL-4 is locked and an application still has not
responded for the track callbacks and another payload PL-3 is stopped (OpenSAF
stop). Application is hosted on PL-5 and its track flags are same as AMFD:
(SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY | SA_TRACK_VALIDATE_STEP |
SA_TRACK_START_STEP).
In this case what is observed is when PL-4 is locked both AMF and app gets
track callback for CHANGE_START.Here AMF responds for the callback but
application does not respond. Now PL-3 is stopped. Here CLM delievers track
callback for COMPLETED step but it contains numberOfItems=2 both payload PL-3
and PL-4. Even application the same.
Application never responds for the PL-4 callback and node lock timer expires at
CLMD and it again sends completed callback to both AMFD and application. Since
both AMFD and application has registered for SA_TRACK_CHANGES_ONLY,I really
doubt CLM should send callback for both PL-3 and PL-4. In the description of
ticket I have pointed out this problem for CHANGE_START case. In CLM spec in
section 3.5.2 SaClmClusterTrackCallbackT_4 page 51:

The value of the numberOfItems attribute in the structure to which the
notificationBuffer parameter points might be greater than the value of the
numberOfMembers parameter if either the SA_TRACK_CHANGES flag or the
SA_TRACK_CHANGES_ONLY flags is set, and one or more member nodes have left
the cluster membership. In this case, the structure to which the
notificationBuffer parameter points might contain information about the current
members of the cluster and also about nodes that have recently left the cluster
membership.

I am going though ticket list and spec for more information regarding this.
Thanks,
Praveen

Attachments:

-
[node_lock_and_stop.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/node_lock_and_stop.tgz)
(382.7 kB; application/x-compressed)
-
[two_nodes_lock.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/6b54e875/538b/attachment/two_nodes_lock.tgz)
(335.0 kB; application/x-compressed)

---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Wed Mar 15, 2017 06:27 AM UTC
**Owner:** Praveen
**Attachments:**

-
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd)
(3.4 MB; application/octet-stream)
-
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd)
(860.9 kB; application/octet-stream)

Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes more time
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock
of PL-4.

CLM and AMF traces are attached.
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on
PL-3. When termination of amf_demo still going on AMF gets another track
callback with rootcausetentity as PL-4. However callback contains information
of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4
change_started completed and sends completion callback for PL-4. In this
callback, AMF clears internal flags which monitors the graceful removal of
nodes. Since AMF never responded for PL-3 callback, callback timer expires in
CLMD and it sends complete callback to AMF. AMF thinks this is the case of
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the
nodes. AMF registers params are:

SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
I am still evaluating whther issue is in CLM or AMF. Since AMF registers for
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all
subsequent callbacks?
Also AMF should respond to callback when it has completed termination of comps.

---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/ope

[tickets] [opensaf:tickets] #2371 AMF: NPM app went into unstable state while expanding cluster

2017-03-14 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d



---

** [tickets:#2371] AMF: NPM app went into unstable state while expanding 
cluster**

**Status:** assigned
**Milestone:** 5.2.RC2
**Created:** Tue Mar 14, 2017 08:03 AM UTC by Chani Srivastava
**Last Updated:** Tue Mar 14, 2017 08:03 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/2371/attachment/messages) 
(86.9 kB; application/octet-stream)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2371/attachment/osafamfd) 
(13.0 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 8603( 5.2.MO-1)
4 node cluster without PBE

Summary - Application went into unstable state and campaign execution could not 
complete while expanding the cluster using campaign

Steps:
1. Brought up an NPM application with 5 SUs
2. Using campaign add a 3rd payload PL-5 to the cluster

App went into bad state

Mar 17 04:38:13 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:15 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:17 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:19 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:21 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:23 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:25 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:27 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state
Mar 17 04:38:29 NewSC1 osafamfd[2562]: NO 'safSg=SGONE,safApp=NPMAPP' is in 
unstable/transition state



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-14 Thread Praveen




---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Tue Mar 14, 2017 09:29 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2365 AMF: Active controller went for continuous reboots when an NPM app is upgraded with more SIs and CSIs

2017-03-13 Thread Praveen

- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2365] AMF: Active controller went for continuous reboots when an 
NPM app is upgraded with more SIs and CSIs**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Sat Mar 11, 2017 03:40 PM UTC by Chani Srivastava
**Last Updated:** Sat Mar 11, 2017 03:40 PM UTC
**Owner:** Praveen


**Environment details**

OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads / no PBE )

**Steps followed & Observed behaviour**
1. Import attached xml
2. Bring up the attached NPM.sh application
3. Execute attached campaign22.xml to upgrade the application

Campaign22.xml adds more SIs and CSIs ( i.e work ) and assign it to SUs which 
can handle more work and also assign to spare SUs


Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Restarting a component of 
'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (comp restart count: 1)
Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 
'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Restarting a component of 
'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (comp restart count: 2)
Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 
'safComp=COMP2SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 
'avaDown' : Recovery is 'componentRestart'
|
Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO Performing failover of 
'safSu=SU1,safSg=SGONE,safApp=NPMAPP' (SU failover count: 1)
Oct  2 18:03:43 OSAF-SC1 osafamfnd[6292]: NO 
'safComp=COMP1SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' recovery action 
escalated from 'componentRestart' to 'suFailover'
|
Oct  2 18:03:47 OSAF-SC1 osafamfnd[6292]: NO 
'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' recovery action 
escalated from 'componentRestart' to 'nodeFailover'
Oct  2 18:03:47 OSAF-SC1 osafamfnd[6292]: NO 
'safComp=COMP3SU1NPMAPP,safSu=SU1,safSg=SGONE,safApp=NPMAPP' faulted due to 
'avaDown' : Recovery is 'nodeFailover'
|
Oct  2 18:03:49 OSAF-SC1 osafamfnd[6292]: NO Received reboot order, ordering 
reboot now!
Oct  2 18:03:49 OSAF-SC1 osafamfnd[6292]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Received reboot order, OwnNodeId = 131343, SupervisionTime = 60
Oct  2 18:03:49 OSAF-SC1 opensaf_reboot: Rebooting local node; timeout=60


I will share the logs and scripts offline as they are hude in size


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2269 amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way Active model.

2017-03-10 Thread Praveen

- **status**: assigned --> review



---

** [tickets:#2269] amf: saAmfSGNumPrefAssignedSUs is not honored in N-Way 
Active model.**

**Status:** review
**Milestone:** 5.0.2
**Created:** Wed Jan 18, 2017 06:08 AM UTC by Praveen
**Last Updated:** Wed Jan 18, 2017 06:08 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2269/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


AMF assigns more SUs than the configured vaue of saAmfSGNumPrefAssignedSUs in 
N-Way Active model.
Issue can be reproduced by brining up the attached configurration.
In the application saAmfSGNumPrefAssignedSUs is set to 2:
 immlist safSg=NWay_Active\,safApp=NWay_Active | grep -i prefass
saAmfSGNumPrefAssignedSUs  SA_UINT32_T  2 (0x2)

But AMF is giving assignmets to all the three SUs:
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)

Since this attribute is valid for N-Way model also, issue is applicable to 
N-Way model also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1190 AMF: saAmfSIPrefActiveAssignments has wrong default, stopping scaling nway active SGs

2017-03-09 Thread Praveen

Hi All,
I think patch for this ticket can be  pushed in other branches also because we 
are retaining both the definitions. If a user sets it to 1 then default value 
will remain 1. If a user sets it to 0 then default value will be 
PrefAssignedSUs. A user has always the facility to lock the SI for no 
assignments.

Thanks,
Praveen



---

** [tickets:#1190] AMF: saAmfSIPrefActiveAssignments has wrong default, 
stopping scaling nway active SGs**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Thu Oct 23, 2014 01:10 PM UTC by Hans Feldt
**Last Updated:** Fri Feb 24, 2017 06:15 AM UTC
**Owner:** Praveen


Problem: In naway-active, SUs are not instantiated unless 
saAmfSIPrefActiveAssignments is configured.

saAmfSIPrefActiveAssignments is a configuration attribute only valid for the 
nway-active redundancy model.

According to the spec 3.6.5.3 it should have a default value of "the preferred 
number of assigned service units."

and 
"saAmfSGNumPrefAssignedSUs" should have a default value of "the preferred 
number of in-service service units"

and 

"saAmfSGNumPrefInserviceSUs" should have a default value of "the number of the 
service units configured for the service group."

The value of saAmfSIPrefActiveAssignments is currently set to one when not 
configured, instead it should be set to saAmfSGNumPrefAssignedSUs.


In order to avoid any backward compatibility issue, choice is left to the user 
for default value of the attribute.
Default value of  saAmfSIPrefActiveAssignments will be either 
saAmfSGNumPrefAssignedSUs or 1
based on user choice.
Following are conditions in which different default values will be honoured:
-if a user configures saAmfSIPrefActiveAssignments=1 then SI will assigned to 
only one
SU.This is to ensure backward compatibility.
-if a user does not configure attribute saAmfSIPrefActiveAssignments in 
application or
 deletes this attributes via CCB operation then AMFD will still  honor default 
value as 1. This
 is again to ensure backward compatibility.
-if a user sets saAmfSIPrefActiveAssignments=0 via CCB or in application conf 
then AMFD will
 use section 3.6.5 definition for default value i.e saAmfSGNumPrefAssignedSUs.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2325 clm: standby clmd crashed after failing to read node configuration from IMM.

2017-03-09 Thread Praveen

- **status**: review --> fixed
- **Comment**:

changeset:   8682:50a2033a8a8d
branch:  opensaf-5.0.x
parent:  8679:7ec6c15c249f
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Fri Mar 10 10:48:17 2017 +0530
summary: clmd: try to re-read node config from IMM if BAD_HANDLE is 
returned [#2325].

changeset:   8683:59e265654232
branch:  opensaf-5.1.x
parent:  8680:e02390320bbb
user:    Praveen Malviya <praveen.malv...@oracle.com>
date:Fri Mar 10 10:49:06 2017 +0530
summary: clmd: try to re-read node config from IMM if BAD_HANDLE is 
returned [#2325].

changeset:   8684:9338ad3cacc0
tag: tip
parent:  8681:0e9c5da42416
user:Praveen Malviya <praveen.malv...@oracle.com>
date:Fri Mar 10 10:49:44 2017 +0530
summary: clmd: try to re-read node config from IMM if BAD_HANDLE is 
returned [#2325].





---

** [tickets:#2325] clm: standby clmd crashed after failing to read node 
configuration from IMM.**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Fri Feb 24, 2017 09:32 AM UTC by Praveen
**Last Updated:** Fri Mar 03, 2017 10:40 AM UTC
**Owner:** Praveen


Issue is not reproducible.
While coming up as standby,  CLMD successfully initializes with IMM. It 
successfuly reads cluster related configuration. While reading node related 
configuration from IMM, CLMD make a calls to saImmOmSearchNext_2(). This API 
could not send any message to IMMND and failed:
Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA OpenSAF imm lib: Message loss 
detected for dest 565213425675031 service id:25
Feb 15 06:32:17 SC-2-2 osafimmnd[3930]: WA IMMND - Client Node Get Failed for 
cli_hdl:932008034831
Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA OpenSAF imm lib: Message loss 
detected for dest 565213425675031 service id:25
Feb 15 06:32:17 SC-2-2 osafclmd[3972]: WA marking handle as exposed

CLMD does not explicitly check  whether node config read was sucessful or not. 
It comes and completes the cold sync. When a payload joins the cluster, active 
CLMD checkpoints run time data for the node. Since node is not present on 
standby CLMD, it crashes:

Feb 15 06:33:26 SC-2-2 osafimmd[3915]: NO SBY: New Epoch for IMMND process at 
node 2020f old epoch: 22  new epoch:23
Feb 15 06:33:26 SC-2-2 osafclmd[3972]: ER Node is NULL,problem with the 
database.
Feb 15 06:33:26 SC-2-2 osafclmd[3972]: 
../../opensaf/src/clm/clmd/clms_mbcsv.c:468: ckpt_proc_node_rec: Assertion '0' 
failed.
Feb 15 06:33:27 SC-2-2 osafamfnd[4002]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2354 osaf: How to detect if payload is being in "SC Absence" mode.

2017-03-07 Thread Praveen




---

** [tickets:#2354] osaf: How to detect if payload is being in "SC Absence" 
mode.**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Wed Mar 08, 2017 07:28 AM UTC by Praveen
**Last Updated:** Wed Mar 08, 2017 07:28 AM UTC
**Owner:** nobody


This discussion ticket is being raised based on a user list query dated March 
1st, 2017.
The query says:
 "We have enabled the new feature "SC Absence" of OpenSAF 5.x in our product, 
it works good so far.
 
 Now we need to make some actions when PLD go in/out "SC Absence" mode, we have 
to find a way in PLD to detect if it is being in "SC Absent" mode or not.
 So, does anyone knows how to make it by a utility/tool and C code(i.e. OpenSAF 
API) as well?
 "
 I think we do not have any API which can be used to query OpenSAF for knowing 
SC absence state.
MDS  up and down events of directors can be used to decide SC absence state as 
some agents are and node directors are using. But this will add lot of code in 
application.

Please update this ticket for a known or proposed solution. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2268 amf: assignment from higher ranked SU is removed in N-Way Active model.

2017-03-07 Thread Praveen

Similar issue in N-Way model also when SiPrefStandbyAssignment is reduced. Also 
AMFD is not checking the HA state of the susi and tries to delete active susi 
and crashes:

Mar  8 11:38:25 SC-1 osafimmnd[4765]: NO Ccb 4 COMMITTED (immcfg_SC-1_5673)
Mar  8 11:38:25 SC-1 amf_demo[5464]: >CSI Remove=>
Mar  8 11:38:25 SC-1 amf_demo[5464]: 
Comp>:'safComp=NWay,safSu=SU1,safSg=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5464]: 
CSI-->:'safCsi=NWay,safSi=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5464]: CSI FLAG-->: SA_AMF_CSI_TARGET_ONE
Mar  8 11:38:25 SC-1 amf_demo[5464]: <===
Mar  8 11:38:25 SC-1 osafamfnd[4817]: NO Removed 'safSi=NWay,safApp=NWay' from 
'safSu=SU1,safSg=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5464]: saAmfResponse after lopp- 1
Mar  8 11:38:25 SC-1 amf_demo[5494]: >CSI Remove=>
Mar  8 11:38:25 SC-1 amf_demo[5494]: 
Comp>:'safComp=NWay,safSu=SU2,safSg=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5494]: 
CSI-->:'safCsi=NWay,safSi=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5494]: CSI FLAG-->: SA_AMF_CSI_TARGET_ONE
Mar  8 11:38:25 SC-1 amf_demo[5494]: <===
Mar  8 11:38:25 SC-1 osafamfnd[4817]: NO Removed 'safSi=NWay,safApp=NWay' from 
'safSu=SU2,safSg=NWay,safApp=NWay'
Mar  8 11:38:25 SC-1 amf_demo[5494]: saAmfResponse after lopp- 1
Mar  8 11:38:25 SC-1 osafamfd[4803]: src/amf/amfd/su.cc:2072: 
dec_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs > 0' failed.
Mar  8 11:38:25 SC-1 osafamfnd[4817]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  8 11:38:25 SC-1 osafamfnd[4817]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 131343, 
SupervisionTime = 60
Mar  8 11:38:25 SC-1 osafimmnd[4765]: NO Implementer locally disconnected. 
Marking it as doomed 31 <23, 2010f> (safAmfService)
Mar  8 11:38:25 SC-1 osafimmnd[4765]: NO Implementer disconnected 31 <23, 
2010f> (safAmfService)
Mar  8 11:38:25 SC-1 opensaf_reboot: Rebooting local node; timeout=60



Attachments:

- 
[AppConfig-nway_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d2d02dfe/79b6/attachment/AppConfig-nway_3SUs_1SIs.xml)
 (11.9 kB; text/xml)


---

** [tickets:#2268] amf: assignment from higher ranked SU is removed in N-Way 
Active model.**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Wed Jan 18, 2017 05:41 AM UTC by Praveen
**Last Updated:** Wed Jan 18, 2017 05:43 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[AppConfig-nwayactive_3SUs_1SIs.xml](https://sourceforge.net/p/opensaf/tickets/2268/attachment/AppConfig-nwayactive_3SUs_1SIs.xml)
 (13.7 kB; text/xml)


When saAmfSIPrefActiveAssignments is reduced, AMFD removes assignments from 
higher ranked SU when siranked su is not configured.
Steps to reproduce:
1) Bring attached application up on one controller.
2) The only SI is assigned to three SUs. Three SUs have different SURanks. Pref 
active assignments for SI is 3.
3) Reduce pref active assignment for the SI by running following command:
   immcfg -a saAmfSIPrefActiveAssignments=2 safSi=NWay_Active,safApp=NWay_Active
4)Since pref active assignments is reduced by 1, AMFD sends quiesced and 
removal of assignment to SU2.
5)SU2 has rank2. Assignments should be removed from SU3 which has rank 3.


Assignments before reducing pref active assignmets:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

Assignments after reducing pre active assignments:
safSISU=safSu=SU1\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU3\,safSg=NWay_Active\,safApp=NWay_Active,safSi=NWay_Active,safApp=NWay_Active
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to acce

[tickets] [opensaf:tickets] #2334 clm: Fix all Cppcheck 1.77 issue

2017-03-07 Thread Praveen

- **Milestone**: next --> never



---

** [tickets:#2334] clm: Fix all Cppcheck 1.77 issue **

**Status:** wontfix
**Milestone:** never
**Created:** Fri Mar 03, 2017 04:09 AM UTC by A V Mahesh (AVM)
**Last Updated:** Tue Mar 07, 2017 06:04 AM UTC
**Owner:** A V Mahesh (AVM)


[staging/src/clm/clmd/clms_evt.c:602] -> [staging/src/clm/clmd/clms_evt.c:545]: 
(warning) Either the condition 'node!=NULL' is redundant or there is possible 
null pointer dereference: node.
[staging/src/clm/clmd/clms_evt.c:603] -> [staging/src/clm/clmd/clms_evt.c:545]: 
(warning) Either the condition 'node!=NULL' is redundant or there is possible 
null pointer dereference: node.
[staging/src/clm/clmd/clms_evt.c:618]: (warning) Possible null pointer 
dereference: ip
[staging/src/clm/clmd/clms_evt.c:101] -> [staging/src/clm/clmd/clms_evt.c:104]: 
(style) Variable 'clma_down_rec' is reassigned a value before the old one has 
been used.
[staging/src/clm/clmd/clms_evt.c:177] -> [staging/src/clm/clmd/clms_evt.c:182]: 
(style) Variable 'client' is reassigned a value before the old one has been 
used.
[staging/src/clm/clmd/clms_evt.c:188] -> [staging/src/clm/clmd/clms_evt.c:191]: 
(style) Variable 'rc' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:504] -> [staging/src/clm/clmd/clms_evt.c:521]: 
(style) Variable 'node' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:678] -> [staging/src/clm/clmd/clms_evt.c:683]: 
(style) Variable 'node_name' is reassigned a value before the old one has been 
used.
[staging/src/clm/clmd/clms_evt.c:679] -> [staging/src/clm/clmd/clms_evt.c:684]: 
(style) Variable 'op_node' is reassigned a value before the old one has been 
used.
[staging/src/clm/clmd/clms_evt.c:677] -> [staging/src/clm/clmd/clms_evt.c:687]: 
(style) Variable 'rc' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:873] -> [staging/src/clm/clmd/clms_evt.c:877]: 
(style) Variable 'node' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:1039] -> 
[staging/src/clm/clmd/clms_evt.c:1048]: (style) Variable 'op_node' is 
reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:1134] -> 
[staging/src/clm/clmd/clms_evt.c:1142]: (style) Variable 'node' is reassigned a 
value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:1236] -> 
[staging/src/clm/clmd/clms_evt.c:1240]: (style) Variable 'node' is reassigned a 
value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:2028] -> 
[staging/src/clm/clmd/clms_evt.c:2036]: (style) Variable 'mds_rc' is reassigned 
a value before the old one has been used.
[staging/src/clm/clmd/clms_evt.c:2046] -> 
[staging/src/clm/clmd/clms_evt.c:2052]: (style) Variable 'node_down_rec' is 
reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:178] -> [staging/src/clm/clmd/clms_imm.c:184]: 
(style) Variable 'node' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:275] -> [staging/src/clm/clmd/clms_imm.c:295]: 
(style) Variable 'rc' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:335] -> [staging/src/clm/clmd/clms_imm.c:352]: 
(style) Variable 'rc' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:690] -> [staging/src/clm/clmd/clms_imm.c:693]: 
(style) Variable 'node' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:993] -> [staging/src/clm/clmd/clms_imm.c:996]: 
(style) Variable 'trk' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:1721] -> 
[staging/src/clm/clmd/clms_imm.c:1726]: (style) Variable 'node' is reassigned a 
value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:1998] -> 
[staging/src/clm/clmd/clms_imm.c:2004]: (style) Variable 'node' is reassigned a 
value before the old one has been used.
[staging/src/clm/clmd/clms_imm.c:1527]: (style) The scope of the variable 'i' 
can be reduced.
[staging/src/clm/clmd/clms_imm.c:1528]: (style) The scope of the variable 
'attrMod' can be reduced.
[staging/src/clm/clmd/clms_imm.c:1529]: (style) The scope of the variable 
'name' can be reduced.
[staging/src/clm/clmd/clms_imm.c:557]: (style) Variable 'attr_Mod' is assigned 
a value that is never used.
[staging/src/clm/clmd/clms_imm.c:649]: (style) Variable 'attr_Mod' is assigned 
a value that is never used.
[staging/src/clm/clmd/clms_imm.c:829]: (style) Variable 'attr_Mod' is assigned 
a value that is never used.
[staging/src/clm/clmd/clms_main.c:329]: (style) Suspicious condition 
(assignment + comparison); Clarify expression with parentheses.
[staging/src/clm/clmd/clms_main.c:87] -> [staging/src/clm/clmd/clms_main.c:91]: 
(style) Variable 'evt' is reassigned a value before the old one has been used.
[staging/src/clm/clmd/clms_main.c:147]: (warning) fscanf() without

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1067 matches

Mail list logo