[tickets] [opensaf:tickets] #211 saEvtChannelUnlink API returns SA_AIS_ERR_LIBRARY in a corner case.

2017-04-03 Thread Srikanth R
Apart from ERR_LIBRARY, saEvtChannelUnlink api returns SA_AIS_ERR_NOT_EXIST.


---

** [tickets:#211] saEvtChannelUnlink API returns SA_AIS_ERR_LIBRARY in a corner 
case.**

**Status:** unassigned
**Milestone:** future
**Created:** Wed May 15, 2013 07:09 AM UTC by Mathi Naickan
**Last Updated:** Wed Jul 15, 2015 02:50 PM UTC
**Owner:** nobody
**Attachments:**

- [osafevtd](https://sourceforge.net/p/opensaf/tickets/211/attachment/osafevtd) 
(178.2 kB; application/octet-stream)


Setup: 
SLES 11 64bit VM setup.
Test Scenario:
1. Invoke saEvtInitialize.
2. Open Channel as CREATE and PUBLISHER 
3. Allocate, AttributeSet? and Free the Event 
4. Close and Unlink the Channel 
5. Finalize Evt session.
Observed from /var/log/messages that runtime object delete fails for evt 
channel.

Oct 4 18:33:31 linux-b4xy osafevtd[4588]: saImmOiRtObjectDelete failed. 
Channel: safChnl=channel_37. rc = 12
=
>From osafevtd logs:
==
Oct 4 18:33:31.846959 osafevtd [4588:eds_evt.c:1079] >> eds_proc_eda_api_msg
Oct 4 18:33:31.846974 osafevtd [4588:eds_evt.c:0444] >> 
eds_proc_chan_unlink_msg: agent dest: 20100310f003d
Oct 4 18:33:31.846989 osafevtd [4588:eds_ll.c:1669] >> eds_channel_unlink: 
channel name: safChnl=channel_37
Oct 4 18:33:31.847004 osafevtd [4588:eds_ll.c:1674] TR Use count: 0
Oct 4 18:33:31.847019 osafevtd [4588:eds_ll.c:0505] >> is_active_channel: 
chan_name: safChnl=channel_37
Oct 4 18:33:31.847034 osafevtd [4588:eds_ll.c:0513] << is_active_channel: true: 
channel is not marked as unlinked
Oct 4 18:33:31.847050 osafevtd [4588:eds_ll.c:1678] TR Setting the unlink flag 
for this channel
Oct 4 18:33:31.847064 osafevtd [4588:eds_ll.c:0389] >> eds_remove_cname_rec: 
chan_name: safChnl=channel_37
Oct 4 18:33:31.847080 osafevtd [4588:eds_ll.c:0410] << eds_remove_cname_rec
Oct 4 18:33:31.847095 osafevtd [4588:eds_ll.c:1684] TR Use count is zero, 
delete the and IMM object
Oct 4 18:33:31.848039 osafevtd [4588:eds_ll.c:1689] ER saImmOiRtObjectDelete 
failed. Channel: safChnl=channel_37. rc = 12
Oct 4 18:33:31.848058 osafevtd [4588:eds_ll.c:1690] << eds_channel_unlink
Oct 4 18:33:31.848073 osafevtd [4588:eds_evt.c:0449] TR Channel unlink failed 
for :20100310f003d
==
Changeset: 2852
Note: When this scenario is run in batch mode, this issue is observed.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.

2017-03-15 Thread Srikanth R
>From the starting of CLM implementation, the service doesn't support admin 
>operations on more than one node simultaneously. There was a discussion ( or 
>ticket) on the earlier trac ticket system that CLM doesn't support operation 
>on two entities simultaneously. 


Below is the simple scenario to reproduce.

-> Bring up CLM agent, and subscribe to the track callback. Do not respond to 
the START callback.

-> Now perform CLM lock operation on the two payloads in two different 
terminals.

-> In the CLM application, Respond to the callbacks only after invoking both 
admin operations.

-> Both admin operations shall result in SA_AIS_ERR_REPAIR_PENDING return code. 
It seems that CLM doesn't store the invocation id for the initial admin op from 
the below output in syslog.

Mar 15 11:54:20 SLES-1 osafamfd[3276]: NO Pending Response sent for CLM track 
callback::OK '7'


---

** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING 
for first node.**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen
**Last Updated:** Tue Mar 14, 2017 09:29 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) 
(3.4 MB; application/octet-stream)
- 
[osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) 
(860.9 kB; application/octet-stream)


Steps to reproduce:
1) Bring 4 nodes cluster up.
2) Deploy AMf demo on PL-3 and PL-4.
3) LOCK amfd nodes PL-3 and PL-4.
4) Make arranegements so that termination of amf_demo on PL-3 takes  more time 
compare to PL-4.
5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock 
of PL-4.

CLM and AMF traces are attached.  
Analysis:
When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on 
PL-3. When termination of amf_demo still going on AMF gets another track 
callback with rootcausetentity as PL-4. However callback contains information 
of PL-3 also. AMFD starts terminating  amf_demo on PL-4 but at the same time it 
responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 
change_started completed and sends completion callback for PL-4. In this 
callback, AMF clears internal flags which monitors the graceful removal of 
nodes. Since AMF never responded for PL-3 callback, callback timer expires in 
CLMD and it sends complete callback to AMF. AMF thinks this is the case of 
nodefailover and tries to failover PL-3.

Note: In all these stages, CLM sends track callback with information of all the 
nodes. AMF registers params are:
 
SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP.
  I am still evaluating whther issue is in CLM or AMF. Since AMF registers for 
**|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all 
subsequent callbacks?
 Also AMF should respond to callback when it has completed termination of comps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2351 Opensaf failed to start on CLM locked node

2017-03-14 Thread Srikanth R
- **summary**: IMM: 2PBE: Opensaf failed to start on standby controller --> 
Opensaf failed to start on CLM locked node
- Description has changed:

Diff:



--- old
+++ new
@@ -5,7 +5,7 @@
 2PBE enable with no load
 
 Summary:
-OpenSAF failed to start on standby controller when 2PBE is enabled
+OpenSAF failed to start on standby controller when 2PBE is enabled and standby 
is in CLM locked state
 
 Step performed:
 1. Enabled 2PBE in immd.conf for both controllers



- **status**: invalid --> unassigned
- **Component**: imm --> base
- **Version**:  --> 5.2.FC
- **Comment**:

Re-opening the ticket.

It should be  either documented that opensafd shall fail to start on CLM locked 
node or otherwise opensafd should be started on CLM locked node with all 
services returning SA_AIS_ERR_UNAVAILABLE for the applications.





---

** [tickets:#2351] Opensaf failed to start on CLM locked node**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Tue Mar 07, 2017 08:41 AM UTC by Chani Srivastava
**Last Updated:** Tue Mar 07, 2017 10:52 AM UTC
**Owner:** nobody
**Attachments:**

- 
[Logs2PBE.zip](https://sourceforge.net/p/opensaf/tickets/2351/attachment/Logs2PBE.zip)
 (1.2 MB; application/zip)


Environment details

OS : Suse 64bit
Changeset : 8603( 5.2.MO-1)
2PBE enable with no load

Summary:
OpenSAF failed to start on standby controller when 2PBE is enabled and standby 
is in CLM locked state

Step performed:
1. Enabled 2PBE in immd.conf for both controllers
2. Started opensaf on all nodes sequntially 




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects

2017-03-09 Thread Srikanth R
To my understanding, this ticket is raised to correct the invalid return code ( 
ERR_LIBRARY).  As per the ticket description, the expected behavior is "

Expected behavior - Proper return code with error string should be returned 
"

What is the necessity of a new ticket ?


---

** [tickets:#2284] IMM: Improper return code without any error string while 
deleting large number of objects**

**Status:** invalid
**Milestone:** 5.2.RC1
**Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava
**Last Updated:** Thu Mar 09, 2017 01:15 PM UTC
**Owner:** nobody


Steps to reproduce:

1. Bring up opensaf on a cluster
2. Create around 10k objects
3. Try deleating these objects in one immcfg operation

Output:
Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2)

No error string stating the cause of failure is returned.

Syslog - immcfg: ER TOO MANY Object Names line:733

Expected behavior - Proper return code with error string should be returned 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2339 CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES

2017-03-02 Thread Srikanth R
Either "shutdown -r now" or simple "reboot" command should be suffice for 
graceful reboot


---

** [tickets:#2339] CLM : Cluster reset doesn't succed as "reboot now" command 
fails on SLES**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 06:05 AM UTC by Srikanth R
**Last Updated:** Fri Mar 03, 2017 06:05 AM UTC
**Owner:** nobody


Changeset : 8634 5.2.FC
SLES TIPC setup with one controller.


 Once the controller is brought up with opensaf 5.2.FC, the following cluster 
reset command is issued.
 
 immadm -o 4  safCluster=myClmCluster
 
 The command failed with the following log.
 
Mar 11 16:09:52 SUSE-S1-C1 osafclmd[6772]: Command: 
/usr/lib64/opensaf/opensaf_reboot 0 not_used 1 failed, rc = 256


On SLES, "reboot now" command fails. Instead "shutdown -r now"  should be 
invoked for graceful shutdown. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2339 CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES

2017-03-02 Thread Srikanth R



---

** [tickets:#2339] CLM : Cluster reset doesn't succed as "reboot now" command 
fails on SLES**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 06:05 AM UTC by Srikanth R
**Last Updated:** Fri Mar 03, 2017 06:05 AM UTC
**Owner:** nobody


Changeset : 8634 5.2.FC
SLES TIPC setup with one controller.


 Once the controller is brought up with opensaf 5.2.FC, the following cluster 
reset command is issued.
 
 immadm -o 4  safCluster=myClmCluster
 
 The command failed with the following log.
 
Mar 11 16:09:52 SUSE-S1-C1 osafclmd[6772]: Command: 
/usr/lib64/opensaf/opensaf_reboot 0 not_used 1 failed, rc = 256


On SLES, "reboot now" command fails. Instead "shutdown -r now"  should be 
invoked for graceful shutdown. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)

2017-02-28 Thread Srikanth R
- **status**: unassigned --> invalid
- **Comment**:

Closing the ticket as invalid, as older library file is remnant during 
re-installation. After proper installation, there is no issue with opensafd 
startup.



---

** [tickets:#2327] Opensaf failed to start on active controller  ( random)**

**Status:** invalid
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 06:22 AM UTC
**Owner:** nobody
**Attachments:**

- 
[opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz)
 (1.4 MB; application/x-compressed-tar)


Changeset: 8634 5.2.FC
SLES single node TIPC setup.


Issue : opensafd failed to startup on active controller for the first time.

Below is the output from syslog

Mar  6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started
Mar  6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 
2
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed   DESC:AMFD
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery

Below is the output from clmd.
Mar  6  1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << 
clms_mds_svc_event
Mar  6  1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Mar  6  1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << 
rt_object_update_common
Mar  6  1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN 
saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again
Mar  6  1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << 
clms_cluster_update_rattr


Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are 
attached.

This issue is random. Observed two times out of three times when started on 
lone active controller.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)

2017-02-28 Thread Srikanth R



---

** [tickets:#2327] Opensaf failed to start on active controller  ( random)**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 06:22 AM UTC
**Owner:** nobody
**Attachments:**

- 
[opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz)
 (1.4 MB; application/x-compressed-tar)


Changeset: 8634 5.2.FC
SLES single node TIPC setup.


Issue : opensafd failed to startup on active controller for the first time.

Below is the output from syslog

Mar  6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started
Mar  6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 
2
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed   DESC:AMFD
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery

Below is the output from clmd.
Mar  6  1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << 
clms_mds_svc_event
Mar  6  1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Mar  6  1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << 
rt_object_update_common
Mar  6  1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN 
saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again
Mar  6  1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << 
clms_cluster_update_rattr


Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are 
attached.

This issue is random. Observed two times out of three times when started on 
lone active controller.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #252 amf: saAmfPmStart_3() returns SA_AIS_ERR_ACCESS when called with invalid recovery.

2017-01-12 Thread Srikanth R
Yes. The ticket can be closed as invalid, as the api can return ERR_ACCESS.


---

** [tickets:#252] amf: saAmfPmStart_3() returns SA_AIS_ERR_ACCESS when called 
with invalid recovery.**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Thu May 16, 2013 06:49 AM UTC by Praveen
**Last Updated:** Fri Nov 04, 2016 09:52 AM UTC
**Owner:** Nagendra Kumar


Migrated from http://devel.opensaf.org/ticket/2813.
Changeset:3728
 When saAmfPmStart_3() is called with invalid value of 
SaAmfRecommendedRecoveryT (say 19), it returns SA_AIS_ERR_ACCESS instead of 
SA_AIS_ERR_INVALID_PARAM. SA_AIS_ERR_ACCESS should be returned when AMF rejects 
recommended recovery from functionality perspective and should not returned as 
validation check.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-08 Thread Srikanth R
For the scenario-2, 
-> Management software e.g. SWN other than opensafd issued reboot on standby 
controller. From opensaf perspecitve , the standby controller might be healthy 
member of a cluster. But from the SWN perspecitve, node needs to be repaired 
and reboot is invoked.

-> When reboot command is invoked by SWN, all services in configured runlevel 
shall be stopped in the order.

-> Once the opensafd stop script is invoked on standby controller, active 
controller detects that the standby controller is in healthy state and remote 
fencing shall be done.

-> As part of remote fencing, the node shall be hard rebooted, which doesn't 
give chance for other services in runlevel to be stopped gracefully.

-> If the SWN has a database service ( e.g. drbd) which is to be stopped after 
opensafd stop, the database service stop script shall not be invoked as remote 
fencing is done. This may result in bad state for the other management software 
e.g. SWN.

 Suggestion :

1)  Either opensaf shall document that admin needs to perform clm admin lock of 
standby controller before repairing. OR
2)  FM should detect the difference between opensafd stop and hung opensaf 
processes. As part of opensafd stop, peer fmd on standby contoller can update 
fmd on active controller that opensafd on standby is going gracefully. 



---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 11:49 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-07 Thread Srikanth R
There are two scenarios where "opensafd stop" is invoked on any opensaf 
controller.

SCENARIO-1) Where /etc/init.d/opensafd script is invoked manually on command 
prompt when the system is running and up.
SCENARIO-2) Software on a controller ( other than opensafd) invoked "reboot"  
for which opensafd stop is invoked in run level 3 or higher.

 With the patch submitted for #2160,
 
 a)node shall go for reboot in scenario-1, if administrator doesn't invoke clm 
admin operation. This is fine.
 
b) For scenario-2, all run level services shall not be stopped gracefully as 
the node shall be rebooted abruptly after opensafd stop as admin did not invoke 
clm admin operation. So, opensafd as a HA software shall not support graceful 
reboot on standby controller with the #2160 fix ?


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Nov 02, 2016 11:40 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because 

[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

2016-11-01 Thread Srikanth R
There are three issues in the ticket raised.

1) As per the ticket #2094 comments, "/etc/init.d/opensafd stop" is not a 
proper way to bring down opensaf. It is suggested that to bring down a faulty 
node,  CLM lock on the node can be performed and later reboot command can be 
invoked manually. 

2) I cannot think of any real use case scenario for "concurrent 'opensafd stop' 
on controller and opensafd start on another controller".

In a fault scenario, reboot -f is called where none of the runlevel 
services shall be called during node recovery process. So, the scenario of 
simultaneous 'opensafd stop on SC-1 and opensafd start on SC-2' is not possible 
in production environment.
   
3) Deploying such a large number of components on controller is not suggested, 
as the failure or fault of user components can impact middleware ( opensaf) 
functionality on the entire cluster.


---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 06:59 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used

2016-11-01 Thread Srikanth R
Zoran,

  Node reboot recovery is to be followed, when the system cannot recover from 
the observed fault. For a fault like amfd crashing, node reboot can be 
followed. But in the current scenario, upon reboot same configuration exists 
and node shall go for reboot as opensafd is enabled in the runlevel by default. 
  
   If the system has the same environment after reboot, then it doesn't help 
user / system by rebooting  to recover from a misconfiguration or even a fault.
   
  My expectation is that node shouldn't go for reboot and opensafd should 
be either running in a suspended way or can even be stopped. This issue is 
observed mainly for newbies. Rebooting a node upon starting opensaf for 
misconfiguration doesn't look good. 


---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Nov 01, 2016 07:26 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used

2016-11-01 Thread Srikanth R
I think, the discussion got deviated by the usage of PL string in nodes.cfg. 

On the fist node in the opensaf cluster, the following info is filled up in 
opensaf cfg files.


cat /usr/share/opensaf/immxml/nodes.cfg 
SC node-1 node-1
SC node-2 node-2
PL node-3 node-3
PL node-4 node-4
PL node-5 node-5
PL node-6 node-6

cat /etc/opensaf/slot_id
1

cat /etc/opensaf/node_name
node-3
cat /etc/opensaf/node_type
controller


-> Opensafd starts successfully, but with the following output
safSISU=safSu=node-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)


-> After a timegap of 5 minutes, the node went for reboot with the following 
output.

Nov  1 12:31:22 CONTROLLER-1 osaffmd[3945]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE 
assignment received within the time limit, OwnNodeId = 131343, SupervisionTime 
= 60
Nov  1 12:31:22 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60


Observed behavior :
 
 If user mistakenly populates the node_name with the payload's node_name and 
starts the opensafd script, then user shall not be informed about 
mis-configuration. The node reboots continuously as opensafd is enabled in 
runtime by default during RPM installation.

Expected behavior :

  Either fms / imm / amf should detect that the node_name used in bringing up 
is intended for payload, but not for controller.  More importantly, the node 
should not go for reboot.
   


---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 20, 2016 05:49 PM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-10-12 Thread Srikanth R
Apart from ERR_LIBRARY return value, CKPT open fails with ERR_NO_RESOURCES 
randomly after failover.


---

** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: 
retval = 1
Apr 15  8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with 
return value:2,ckptHandle:63
Apr 15  8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: 
API return code = 2**

> Traces of both controllers and agent trace of payload is attached.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2110 AMF : amfd aborted on both controllers after opensafd stopped on payload

2016-10-10 Thread Srikanth R



---

** [tickets:#2110] AMF : amfd aborted on both controllers after opensafd 
stopped on payload**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Oct 11, 2016 05:35 AM UTC by Srikanth R
**Last Updated:** Tue Oct 11, 2016 05:35 AM UTC
**Owner:** nobody


Changeset : 5.1GA 8190
Setup : 4 nodes setup with PBE enabled ( 1 lakh objects) and headless feature 
enabled .

Steps performed :
-> Brought up opensaf on 4 node setup
-> Ran IMM test application on Oct 8th and also performed middleware failovers.
-> For two days, setup is left idle.
-> On Oct 10 14:07:38, stopped opensaf on PL-4 for which amfd on both 
controllers aborted


Oct 10 14:07:38 SLES-SLOT1 osafimmnd[2748]: NO Global discard node received for 
nodeId:2040f pid:3261
Oct 10 14:07:38 SLES-SLOT1 osafamfd[2788]: NO Node 'PL-4' left the cluster
Oct 10 14:07:38 SLES-SLOT1 osafamfd[2788]: su.cc:2006: dec_curr_act_si: 
Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.
Oct 10 14:07:38 SLES-SLOT1 osafamfnd[2798]: WA AMF director unexpectedly crashed



 Below is the back trace :
 
 2  0x7f7426025197 in __osafassert_fail (__file=0x51b4ed "su.cc", 
__line=2006,
   __func=0x51ce30 <AVD_SU::dec_curr_act_si()::__FUNCTION__> "dec_curr_act_si", 
__assertion=0x51c884 "saAmfSUNumCurrActiveSIs > 0") at sysf_def.c:281
3  0x004de88c in AVD_SU::dec_curr_act_si (this=0x7bde40) at su.cc:2006
4  0x004c504e in avd_susi_delete (cb=0x75dba0 <_control_block>, 
susi=0x7eb940, ckpt=false) at siass.cc:554
5  0x0049a326 in SG_NORED::node_fail (this=0x7bc210, cb=0x75dba0 
<_control_block>, su=0x7bde40) at sg_nored_fsm.cc:781
6  0x004bd4d7 in avd_node_down_mw_susi_failover (cb=0x75dba0 
<_control_block>, avnd=0x7b04d0) at sgproc.cc:1983
7  0x00461a77 in avd_node_failover (node=0x7b04d0) at ndproc.cc:1142
8  0x00459d63 in avd_mds_avnd_down_evh (cb=0x75dba0 <_control_block>, 
evt=0x7f741c002270) at ndfsm.cc:684
9  0x00453f60 in process_event (cb_now=0x75dba0 <_control_block>, 
evt=0x7f741c002270) at main.cc:775
10 0x00453c83 in main_loop () at main.cc:696
11 0x004541ff in main (argc=2, argv=0x7fffedc7f828) at main.cc:848


 Below is the amfnd trace :
 
 Oct 10 14:07:38.712919 osafamfd [2788:imm.cc:1751] << avd_saImmOiRtObjectDelete
Oct 10 14:07:38.712922 osafamfd [2788:csi.cc:1292] << avd_compcsi_delete
Oct 10 14:07:38.712925 osafamfd [2788:mbcsv_api.c:0773] >> 
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, 
as per the send-type specified
Oct 10 14:07:38.712928 osafamfd [2788:mbcsv_api.c:0803] TR svc_id:10, 
pwe_hdl:65537
Oct 10 14:07:38.712931 osafamfd [2788:mbcsv_util.c:0343] >> 
mbcsv_send_ckpt_data_to_all_peers
Oct 10 14:07:38.712934 osafamfd [2788:mbcsv_util.c:0387] TR dispatching FSM for 
NCSMBCSV_SEND_ASYNC_UPDATE
Oct 10 14:07:38.712936 osafamfd [2788:mbcsv_act.c:0101] TR ASYNC update to be 
sent. role: 1, svc_id: 10, pwe_hdl: 65537
Oct 10 14:07:38.712939 osafamfd [2788:mbcsv_util.c:0399] TR calling encode 
callback
Oct 10 14:07:38.712942 osafamfd [2788:chkop.cc:0228] TR Async update
Oct 10 14:07:38.712945 osafamfd [2788:ckpt_enc.cc:0681] >> enc_siass: io_action 
'2'
Oct 10 14:07:38.712998 osafamfd [2788:ckpt_enc.cc:0704] << enc_siass
Oct 10 14:07:38.713001 osafamfd [2788:mbcsv_util.c:0438] TR send the encoded 
message to any other peer with same s/w version
Oct 10 14:07:38.713004 osafamfd [2788:mbcsv_util.c:0441] TR dispatching FSM for 
NCSMBCSV_SEND_ASYNC_UPDATE
Oct 10 14:07:38.713006 osafamfd [2788:mbcsv_act.c:0101] TR ASYNC update to be 
sent. role: 1, svc_id: 10, pwe_hdl: 65537
Oct 10 14:07:38.713009 osafamfd [2788:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: 
sending to vdest:1
Oct 10 14:07:38.713012 osafamfd [2788:mbcsv_mds.c:0201] TR send type 
MDS_SENDTYPE_RED
Oct 10 14:07:38.713023 osafamfd [2788:mbcsv_mds.c:0244] << mbcsv_mds_send_msg: 
success
Oct 10 14:07:38.713027 osafamfd [2788:mbcsv_util.c:0492] << 
mbcsv_send_ckpt_data_to_all_peers
Oct 10 14:07:38.713030 osafamfd [2788:mbcsv_api.c:0868] << 
mbcsv_process_snd_ckpt_request: retval: 1
Oct 10 14:07:38.713033 osafamfd [2788:siass.cc:0496] >> avd_susi_delete: 
safSu=PL-4,safSg=NoRed,safApp=OpenSAF safSi=NoRed4,safApp=OpenSAF
Oct 10 14:09:23.708873 osafamfd [2802:main.cc:0500] >> initialize







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___

[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported

2016-10-09 Thread Srikanth R



---

** [tickets:#2106] amf: Admin Operations on middleware SUs  / SIs should not be 
supported**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R
**Last Updated:** Sun Oct 09, 2016 11:18 AM UTC
**Owner:** nobody


Changeset : 8190 5.1.GA

-> Bring up a single controller SC-1
-> Now perform lock and unlock operation of middleware SU .i.e 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2.
-> Admin lock operation succeeds, but admin unlock operation times out with the 
assignment to one of middleware SI.

 Following is the opensafd status after the unlock operation.
 
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

  Admin operations on middleware objects should not be supported.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2105 AMF : SG is unstable, if app responds during node link loss detection time period

2016-10-09 Thread Srikanth R



---

** [tickets:#2105] AMF : SG is unstable, if app responds during node link loss 
detection time period**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Sun Oct 09, 2016 07:12 AM UTC by Srikanth R
**Last Updated:** Sun Oct 09, 2016 07:12 AM UTC
**Owner:** nobody


Setup :
Changeset : 8190 
5 node SLES  setup with 2 controllers and 3 payloads ( TIPC -- headless enabled)
2n application deployed on 2 payloads.

Issue : 

 -> Perform admin operation on an AMF enity.
 -> Do not respond to the callback and invoke headless scenario.
 -> On a VM with TIPC setup, 3 seconds is taken to detect the node down. 
 -> If the application responds to a callback in admin operation during this 
time period when the last controller is  down, the message shall not reach any 
controller. Amfnd on payload shall send the "Assigned" message  but not store 
this message. 
 
  For this scenario, SG shall move to unstable state. Below is the snippet from 
syslog, where application responded at 15:48:28 and at 15:48:31 payloads 
detected that last controller is down.
  
 Oct  7 15:48:28 SYSTEST-PLD-1 osafamfnd[9976]: NO Assigned 
'safSi=TestApp_SI1,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: WA AMF director unexpectedly 
crashed
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages
Oct  7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' for pending messages
Oct  7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: WA SC Absence IS allowed:900 
IMMD service is DOWN
Oct  7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: NO IMMD SERVICE IS DOWN, HYDRA 
IS CONFIGURED => UNREGISTERING IMMND form MDS


-> Below is the scenario, when payload detected that there is no controller at 
18:31:34 and amfnd shall call avnd_di_susi_resp_send after the controllers join 
back the cluster. Application responded at 18:31:41.

Oct  7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: WA SC Absence IS allowed:900 
IMMD service is DOWN
Oct  7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: NO IMMD SERVICE IS DOWN, HYDRA 
IS CONFIGURED => UNREGISTERING IMMND form MDS
Oct  7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 
'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 
'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN'
Oct  7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO avnd_di_susi_resp_send() 
deferred as AMF director is offline


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch

2016-10-07 Thread Srikanth R



---

** [tickets:#2100]  Standby should not be rebooted, for  SC absence 
configuration mismatch**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R
**Last Updated:** Fri Oct 07, 2016 07:11 AM UTC
**Owner:** nobody


Changeset : 8190 5.1.GA

-> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in 
immd.conf.

-> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is 
started on SC-2, for which node rebooted.

Oct  7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the 
same as on active IMMD. Active: 900, Standby: 0. Exiting.
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct  7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60

   Here  user had misconfigured the configuration on both the controllers, for 
which standby rebooted. Opensafd is enabled in runlevel as part of installation 
and standby shall reboot continuously until opensafd is stopped on SC-1.
   
  Suggested behavior :
   
   Opensafd should not start on standby, instead of immediate reboot. 
   
   Also, the cluster level  attributes like IMMSV_SC_ABSENCE_ALLOWED,  can be 
moved to imm.xml. Node level attributes like traces enabling can be retained in 
configuration files.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2096 AMF : SG in unstable state for fault in component during admin unlock (headless)

2016-10-05 Thread Srikanth R



---

** [tickets:#2096] AMF : SG in unstable state for fault in component during 
admin unlock (headless)**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 08:08 AM UTC by Srikanth R
**Last Updated:** Wed Oct 05, 2016 08:08 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2096.tgz](https://sourceforge.net/p/opensaf/tickets/2096/attachment/2096.tgz) 
(4.6 MB; application/x-compressed-tar)


Environment :
-
Changeset:  7997 5.1.FC
Setup : 5 nodes setup with 2 controllers and headless feature enabled and PBE 
disabled.
Application : 2N application with 2 SUs and 4 SIs with out si-si deps.

Steps performed :
--

SG moved to unstable state for fault in component when admin unlock operation 
is performed on SG and headless state is invoked. Below are the steps performed.

-> The application is brought up initially and the SIs are fully assigned.

-> Now performed lock,lock-in , unlock-in and unlock operation performed on SG 
with the sufficient time gap.

-> During unlock operation of SG, component 2 of SU1 did not respond to the 
active assignment, headless scenario is invoked.

  3148 12:34:05 10/05/2016 NO safApp=safAmfService "Admin op "UNLOCK" 
initiated for 'safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 
1683627180042"
  3149 12:34:05 10/05/2016 NO safApp=safAmfService 
"safSg=TestApp_SG1,safApp=TestApp_TwoN AdmState LOCKED => UNLOCKED"

-> After headless state is achieved, component2 faulted with csi set callback 
timeout.

Oct  5 12:34:33 SYSTEST-PLD-1 osafamfnd[2626]: NO 
'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted 
due to 'csiSetcallbackTimeout' : Recovery is 'componentRestart'


-> After controllers joined back the cluster, SU2 did not get any assignments.

--> Further operations on SG resulted in UNSTABLE state.
  3202 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op "LOCK" 
initiated for 'safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 
1696512081921"
  3203 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op invocation: 
1696512081921, err: 'SG not in STABLE state 
(safSg=TestApp_SG1,safApp=TestApp_TwoN)'"
  3204 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op done for 
invocation: 1696512081921, result 6"


Logs :

 The traces of SC-1 ( active controller before headless and after headless ) 
and PL-3 ( SU1 hosted) are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2088 CLM : saClmClusterNodeGetAsync returns OK on a non member node

2016-10-03 Thread Srikanth R



---

** [tickets:#2088] CLM : saClmClusterNodeGetAsync returns OK on a non member 
node**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon Oct 03, 2016 07:34 AM UTC by Srikanth R
**Last Updated:** Mon Oct 03, 2016 07:34 AM UTC
**Owner:** nobody


Changeset : 7997 5.1.FC

The saClmClusterNodeGetAsync api returns SA_AIS_OK on a non member node.  The 
expected behavior is saClmClusterNodeGetAsync, should return ERR_AVAILABLE like 
the api saClmClusterNodeGet_4. Currently the saClmClusterNodeGet_4 api returns 
ERR_AVAILABLE on a nonmember node. 


 Below is the snippet from CLM agent trace.
 
 Oct  3 12:40:34.532320 clma [7881:clma_api.c:1235] >> saClmClusterNodeGet_4
Oct  3 12:40:34.532330 clma [7881:clma_api.c:1278] >> clmaclusternodeget
Oct  3 12:40:34.532338 clma [7881:clma_util.c:0036] >> clma_validate_version
Oct  3 12:40:34.532345 clma [7881:clma_util.c:0042] << clma_validate_version
Oct  3 12:40:34.532363 clma [7881:clma_mds.c:1227] >> clma_mds_msg_sync_send
Oct  3 12:40:34.532383 clma [7881:clma_mds.c:0317] >> clma_mds_enc
Oct  3 12:40:34.532392 clma [7881:clma_mds.c:0352] T2 msgtype: 0
Oct  3 12:40:34.532399 clma [7881:clma_mds.c:0366] T2 api_info.type: 4
Oct  3 12:40:34.532406 clma [7881:clma_mds.c:0192] >> clma_enc_node_get_msg
Oct  3 12:40:34.532412 clma [7881:clma_mds.c:0207] << clma_enc_node_get_msg
Oct  3 12:40:34.532418 clma [7881:clma_mds.c:0407] << clma_mds_enc
Oct  3 12:40:34.533347 clma [7881:clma_mds.c:0697] >> clma_mds_dec
Oct  3 12:40:34.533377 clma [7881:clma_mds.c:0729] T2 CLMSV_CLMA_API_RESP_MSG 
rc = 31
Oct  3 12:40:34.533388 clma [7881:clma_mds.c:0809] << clma_mds_dec
Oct  3 12:40:34.533448 clma [7881:clma_mds.c:1253] << clma_mds_msg_sync_send
Oct  3 12:40:34.533474 clma [7881:clma_util.c:0656] >> clma_msg_destroy
Oct  3 12:40:34.533486 clma [7881:clma_util.c:0694] << clma_msg_destroy
Oct  3 12:40:34.533496 clma [7881:clma_api.c:1395] << clmaclusternodeget
Oct  3 12:40:34.533502 clma [7881:clma_api.c:1245] << saClmClusterNodeGet_4
Oct  3 12:40:34.533657 clma [7881:clma_api.c:1422] >> saClmClusterNodeGetAsync
Oct  3 12:40:34.533668 clma [7881:clma_util.c:0036] >> clma_validate_version
Oct  3 12:40:34.533674 clma [7881:clma_util.c:0042] << clma_validate_version
Oct  3 12:40:34.533681 clma [7881:clma_mds.c:1274] >> clma_mds_msg_async_send
Oct  3 12:40:34.533692 clma [7881:clma_mds.c:0317] >> clma_mds_enc
Oct  3 12:40:34.533700 clma [7881:clma_mds.c:0352] T2 msgtype: 0
Oct  3 12:40:34.533707 clma [7881:clma_mds.c:0366] T2 api_info.type: 5
Oct  3 12:40:34.533713 clma [7881:clma_mds.c:0229] >> 
clma_enc_node_get_async_msg
Oct  3 12:40:34.533720 clma [7881:clma_mds.c:0245] << 
clma_enc_node_get_async_msg
Oct  3 12:40:34.533726 clma [7881:clma_mds.c:0407] << clma_mds_enc
Oct  3 12:40:34.533744 clma [7881:clma_mds.c:1296] << clma_mds_msg_async_send
Oct  3 12:40:34.533753 clma [7881:clma_api.c:1497] << saClmClusterNodeGetAsync


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2086 LCK : Lock waiter callbacks are not invoked after glnd restart

2016-09-30 Thread Srikanth R



---

** [tickets:#2086] LCK : Lock waiter callbacks are not invoked after glnd 
restart**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri Sep 30, 2016 06:10 AM UTC by Srikanth R
**Last Updated:** Fri Sep 30, 2016 06:10 AM UTC
**Owner:** nobody


Changeset : 7997 5.1.FC

 Lock waiter callbacks are not invoked after glnd restart. Below are the steps 
performed as part of  application.
 
 -> Initialize with LCK and store as handle 1.
 -> Initialize with LCK and store as handle 2 in another thread.
 -> Open a lock using saLckResourceOpen with handle1
 -> Open the same lock using saLckResourceOpen with handle2
 -> With handle 2, request lock in PR mode using saLckResourceOpen api. Call 
this api 5 times.
 -> WIth handle 1, request lock in EX mode,
 
 As the handle 2 has requested the lock 5 times, the thread for handle 1 
should get 5 lock waiter callbacks. Some times, lock waiter callback is not 
invoked for the thread using handle 1.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2085 CKPT : IMM attributes for ckpt table are increased by 1, when ckpt open returns TIME_OUT

2016-09-29 Thread Srikanth R



---

** [tickets:#2085] CKPT : IMM attributes for ckpt table are increased by 1, 
when ckpt open returns TIME_OUT**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri Sep 30, 2016 05:13 AM UTC by Srikanth R
**Last Updated:** Fri Sep 30, 2016 05:13 AM UTC
**Owner:** nobody


Changeset : 7997 5.1.FC

IMM attributes for ckpt table are increased by 1, when ckpt open returns 
TIME_OUT. Below is the flow of steps in which how application uses CKPT.

-> Initialize with ckpt with callbacks. API returned SA_AIS_OK
-> Invoke selection object. API returned SA_AIS_OK
-> Create a checkpoint using async option. API returned SA_AIS_OK
-> Kill ckpnd process.
-> Check for the callbacks and check the IMM attribute of CKPT object.
Callback is invoked, in which return value is ERR_TIMEOUT. Spec mandates 
that, api should be called again to check whether checkpoint creation is 
successful or not. If the further call returns ERR_EXIST, the previous call is 
successful or the further call returns SA_AIS_OK, the previous call is 
unsuccessful.

 -> As the callback returned SA_AIS_ERR_TIMEOUT, invoked the checkpoint 
creation api async again. This time, api and both callback returned SA_AIS_OK.
 
  Now if you check the attributes for CKPT table object, the attributes 
saCkptCheckpointNumOpeners, saCkptCheckpointNumReaders and 
saCkptCheckpointNumWriters are having a value of 2, instead of expected value 
1. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2082 CKPT : Track cbk not invoked for section creation after cpnd restart

2016-09-29 Thread Srikanth R



---

** [tickets:#2082] CKPT : Track cbk not invoked for section creation after cpnd 
restart**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 29, 2016 11:06 AM UTC by Srikanth R
**Last Updated:** Thu Sep 29, 2016 11:06 AM UTC
**Owner:** nobody


Changeset: 7997 5.1.FC

Track Callback is not invoked after cpnd restart. Below are the apis called 
from the applications , spawned on two nodes .i.e payloads.


On first node :

-> Initialize with cpsv 
-> Create a ckpt with ACTIVE REPLICA flag.
 
 On second node.
 -> Initialize with cpsv.

 On First node,
 -> Open the checkpoint in writing mode
-> Open the checkpoint in reading mode.
 -> Kill cpnd process
 -> Register for Track callback.

On Second node, 
 -> Open the ckpt in read mode.
 -> Kill cpnd proces
 -> Register for Track callback.
 
 
After ensuring that both agents registered for track callback, create a section 
from the application on first node. For section creation, callback should be 
invoked for applications on two nodes.

Currently callback is not invoked for the application on second node. With out 
cpnd restart, callback is invoked for the two applications


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2075 LongDnAllowed attribute should be defined in the imm.xml

2016-09-27 Thread Srikanth R



---

** [tickets:#2075] LongDnAllowed attribute should be defined in the imm.xml**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 27, 2016 12:32 PM UTC by Srikanth R
**Last Updated:** Tue Sep 27, 2016 12:32 PM UTC
**Owner:** nobody


Observed behaviour
--

With 5.1 , all the active services are integrated with Long DN feature. 

  To enable the long Dn feature, user need to modify the attribute 
"longDnsAllowed" for the object opensafImm=opensafImm,safApp=safImmService. 

 The steps about, how to enable the long dn object are mentioned in the PR doc. 
But the object is not defined in imm.xml.  LongDn feature shall be enabled if 
controllers use imm.db or dumped imm.xml  where the attribute is already set , 
but not using generated imm.xml. 


Suggested behaviour


  User should be given option to enable long dn feature in the initial startup 
either by including the attribute in initial generated imm.xml or environmental 
variable in any of opensaf configuration file.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2074 amfd asserted on rebooted controllers continuoulsy after split brain scenario (headless)

2016-09-27 Thread Srikanth R



---

** [tickets:#2074] amfd asserted on rebooted controllers continuoulsy after 
split brain scenario (headless)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 27, 2016 12:14 PM UTC by Srikanth R
**Last Updated:** Tue Sep 27, 2016 12:14 PM UTC
**Owner:** nobody


Setup : 
SLES 11 Physical machine
Changeset :7997 5.1 FC
2 controllers and 2 payloads with headless feature enabled.
2N application with 3 SUs. (AmfDemo).

Issue :

amfd asserted on controllers  continuoulsy for every reboot after  initial 
split brain scenario is observed


Steps performed :

-> Initially brought up four nodes and all the nodes joined the cluster.

-> Brought up the 2N application, with SUs hosted on SC-1 ,SC-2 and PL-3 
successfully.

-> Performed some operations on the AMF objects and the cluster is left in idle 
state later.

-> After a gap of 2 weeks, MDS down event is generated on both the controllers 
for which spilt brain scenario is generated. Because of momentary cable(s) 
unplugging, MDS down event is generated.


Sep 24 21:36:40 SLES-SLOT1 osafimmd[2729]: NO MDS event from svc_id 25 
(change:3, dest:565214187380752)
Sep 24 21:36:40 SLES-SLOT1 kernel: [1297950.833811] TIPC: Established link 
<1.1.1:em1-1.1.2:em1> on network plane A
Sep 24 21:36:40 SLES-SLOT1 osafrded[2710]: Rebooting OpenSAF NodeId = 0 EE Name 
= No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343, 
SupervisionTime = 60


Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: NO Got peer info request from node 
0x2010f with role ACTIVE
Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: Rebooting OpenSAF NodeId = 0 EE Name 
= No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599, 
SupervisionTime = 60


-> As headless feature is enabled, payloads did not go for reboot.

-> Once controllers joined the payloads, amfd asserted on the rebooted 
controller and controllers went for reboot.
Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: NO Received node_up from 2010f: 
msg_id 1
Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: siass.cc:953: avd_susi_recreate: 
Assertion 'su' failed.
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed
Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131343, SupervisionTime = 60


Below is the backtrace :

#0  0x7f1d28510b55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f1d28512131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f1d2a397197 in __osafassert_fail (__file=0x517c15 "siass.cc", 
__line=953, __func=0x518250 
<avd_susi_recreate(avsv_n2d_nd_sisu_state_msg_info_tag*)::__FUNCTION__> 
"avd_susi_recreate", 
__assertion=0x517d01 "su") at sysf_def.c:281
No locals.
#3  0x004c56a5 in avd_susi_recreate (info=0x7f1d20008ec8) at 
siass.cc:953
su = 0x0
__FUNCTION__ = "avd_susi_recreate"
susi = 0x0
node = 0x7bfdf0
susi_state = 0x0
su_state = 0x7f1d200055a0
__PRETTY_FUNCTION__ = "SaAisErrorT 
avd_susi_recreate(AVSV_N2D_ND_SISU_STATE_MSG_INFO*)"
#4  0x00459943 in avd_process_state_info_queue (cb=0x75cba0 
<_control_block>) at ndfsm.cc:78
n2d_msg = 0x7f1d20008ec0
i = 0
queue_size = 4
queue_evt = 0x7a9b60
act_amfnd_node_up_count = 1
found_state_info = true
__FUNCTION__ = "avd_process_state_info_queue"
#5  0x0045a50f in avd_node_up_evh (cb=0x75cba0 <_control_block>, 
evt=0x7f1d20008880) at ndfsm.cc:363
avnd = 0x7bf380
n2d_msg = 0x7f1d20004b30
rc = 1
sync_nd_size = 4
act_nd = true
__FUNCTION__ = "avd_node_up_evh"
#6  0x00453d78 in process_event (cb_now=0x75cba0 <_control_block>, 
evt=0x7f1d20008880) at main.cc:768
__FUNCTION__ = "process_event"
#7  0x00453a9b in main_loop () at main.cc:689
pollretval = 1
cb = 0x75cba0 <_control_block>
evt = 0x7f1d20008880
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
polltmo = -1
term_fd = 17
__FUNCTION__ = "main_loop"
#8  0x00454017 in main (argc=2, argv=0x7fff50cd9958) at main.cc:841


Suggested recovery :

 During a split brain scenario, payloads  should be ordered for reboot even in 
headless feature. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.---

[tickets] [opensaf:tickets] #2070 LCK : IMM attrib update issues for LCK application objects.

2016-09-27 Thread Srikanth R



---

** [tickets:#2070] LCK : IMM attrib update issues for LCK application objects.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 27, 2016 07:12 AM UTC by Srikanth R
**Last Updated:** Tue Sep 27, 2016 07:12 AM UTC
**Owner:** nobody


Following are the two scenarios, in which IMM attributes for a LCK object are 
not properly updated.

SCENARIO - 1 :

 -> Invoke saLckInitialize
 -> Invoke saLckResourceOpen with SA_LCK_RESOURCE_CREATE flag for the resource 
"resource1_101".
 -> Invoke saLckResourceOpenAsync with SA_LCK_RESOURCE_CREATE flag for the 
earlier resource
 -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag in PR mode.
 -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag in PR mode.
 -> Invoke saLckFinalize.
 
 Now that agent invoked Finalize, the stripped count should be zero for the LCK 
object "safLock=resource1_101". But the saLckResourceStrippedCount value is 2,
 
 SCENARIO - 2 :
 
 
 On node 1 :
 
 ->Invoke saLckInitialize.
 -> Invoke saLckResourceOpen with SA_LCK_RESOURCE_CREATE flag 
 
 On node 2 :
 
 -> Invoke saLckInitialize.
 -> Invoke saLckResourceOpen for the same resource as on node1.
 -> Invoke saLckResourceLock  in PR mode.
 -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag.
 -> Invoke saLckFinalize.
 
 
 Now on Node 1, if the  saLckResourceNumOpeners attribute is retrieved for the 
resource where the application is still running then the expected value is 1.  
But 0 is being populated.
And the saLckResourceIsOrphaned. attribute value is expected to be 1, but it is 
set to 0.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1801 lck: saLckResourceOpen returns SA_AIS_ERR_TIMEOUT / SA_AIS_ERR_LIBRARY after failovers / switchovers.

2016-09-26 Thread Srikanth R
- **summary**: lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE 
returning SA_AIS_ERR_TIMEOUT after 5 failovers. --> lck: saLckResourceOpen  
returns SA_AIS_ERR_TIMEOUT / SA_AIS_ERR_LIBRARY after failovers / switchovers.
- **Comment**:

After couple of switchovers / failovers, saLckResourceOpen may fail randomly 
with following return values.

-> SA_AIS_ERR_TIMEOUT
-> SA_AIS_ERR_LIBRARY
-> random return values , which is  out of bound



---

** [tickets:#1801] lck: saLckResourceOpen  returns SA_AIS_ERR_TIMEOUT / 
SA_AIS_ERR_LIBRARY after failovers / switchovers.**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Mon May 02, 2016 09:52 AM UTC by Madhurika Koppula
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** nobody
**Attachments:**

- 
[glsv.tgz](https://sourceforge.net/p/opensaf/tickets/1801/attachment/glsv.tgz) 
(3.0 MB; application/octet-stream)


Setup:
Changeset- 7436
OS: Oracle Linux Server release 6.4 (x86_64)
4 nodes configured with single PBE

some failover tests are being ran.
safLock=resource1_101 object is not getting deleted. Thereby saLckResourceOpen 
with flag SA_LCK_RESOURCE_CREATE is continuously returning SA_AIS_ERR_TIMEOUT.

With sleep of 10secs, 15times retry is done on the same API call.

Snippet from the run:

100|7| SUCCESS : saLckInitialize with valid parameters
100|7| Return Value: SA_AIS_OK
100|7| LckHandle   : 6599312
100|7|
100|7|
100|7| SUCCESS : saLckInitialize with valid parameters
100|7| Return Value: SA_AIS_OK
100|7| LckHandle   : 6599392
100|7|
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7| FAILED  : saLckResourceOpen with valid parameters
100|7| Return Value: SA_AIS_ERR_TIMEOUT

100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE

100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE

100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE Timeout count exceeded: 15

Timestamp of the Active controller at this instant:

May  2 14:22:56 OEL_M-SLOT-2 root: killing osafimmd from run_failover.sh
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
May  2 14:22:56 OEL_M-SLOT-2 opensaf_reboot: Rebooting local node; timeout=60

Timestamp of the Standby controller which is becoming active after failover:

May  2 14:23:00 OEL_M-SLOT-1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
May  2 14:23:00 OEL_M-SLOT-1 osaffmd[1677]: NO Controller Failover: Setting 
role to ACTIVE
May  2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO RDE role set to ACTIVE
May  2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
May  2 14:23:00 OEL_M-SLOT-1 osafimmd[1688]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osaflogd[1711]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafntfd[1722]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafclmd[1733]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafamfd[1744]: NO FAILOVER StandBy --> Active

/var/log/messages and osaflckd traces of both controllers  are attached.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2067 EVT : saEvtEventPublish returns BAD_HANDLE, during middleware si-swap operation

2016-09-26 Thread Srikanth R
- **summary**: EVT : Api returns BAD_HANDLE, during middleware si-swap 
operation --> EVT : saEvtEventPublish returns BAD_HANDLE, during middleware 
si-swap operation
- Description has changed:

Diff:



--- old
+++ new
@@ -2,9 +2,7 @@
 Setup : 2 controllers and 2 payloads with headless feature disabled.
 
 
- Evt api returns SA_AIS_ERR_BAD_HANDLE during middleware si-swap operation.
-
-In the below scenario, saEvtEventPublish returns BAD_HANDLE. The api is 
called, after invoking middleware switchover.
+  The  saEvtEventPublish api is called  with proper handle, after just 
invoking middleware switchover. The api returned SA_AIS_ERR_BAD_HANDLE.
 
  Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: 
event handle: ff84
 Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch






---

** [tickets:#2067] EVT : saEvtEventPublish returns BAD_HANDLE, during 
middleware si-swap operation**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon Sep 26, 2016 12:44 PM UTC by Srikanth R
**Last Updated:** Mon Sep 26, 2016 12:44 PM UTC
**Owner:** nobody


Changeset : 7997 5.1.FC
Setup : 2 controllers and 2 payloads with headless feature disabled.


  The  saEvtEventPublish api is called  with proper handle, after just invoking 
middleware switchover. The api returned SA_AIS_ERR_BAD_HANDLE.

 Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event 
handle: ff84
Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch
 Sep 26 17:42:11.340907 imma [11005:eda_mds.c:0943] T1 Event Server is DOWN on 
node_id: 0
Sep 26 17:42:11.345418 imma [11005:eda_saf_api.c:2097] >> saEvtEventPublish: 
Allocated event handle: ffc00029
Sep 26 17:42:11.345457 imma [11005:eda_saf_api.c:2127] T2 Unable to retrieve 
allocated event handle: ffc00029
Sep 26 17:42:11.345471 imma [11005:eda_saf_api.c:2128] << saEvtEventPublish
Sep 26 17:42:11.347997 imma [11005:eda_saf_api.c:2364] >> saEvtEventSubscribe: 
channel handle: ffd00021
Sep 26 17:42:11.348081 imma [11005:eda_saf_api.c:2447] T2 event server is not 
yet up
Sep 26 17:42:11.348121 imma [11005:eda_saf_api.c:2448] << saEvtEventSubscribe
Sep 26 17:42:11.861956 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event 
handle: ff84
Sep 26 17:42:11.862080 imma [11005:eda_saf_api.c:0363] << saEvtDispatch
Sep 26 17:42:12.299715 imma [11005:ntfa_mds.c:0388] T2 NTFA Rcvd MDS subscribe 
evt from svc 28
Sep 26 17:42:12.299736 imma [11005:ntfa_mds.c:0398] TR NTFS down
Sep 26 17:42:12.299773 imma [11005:ntfa_util.c:1499] >> ntfa_update_ntfsv_state
Sep 26 17:42:12.299782 imma [11005:ntfa_util.c:1501] T1 Current state: 4, 
Changed state: 2
Sep 26 17:42:12.299790 imma [11005:ntfa_util.c:1542] TR Active NTF server 
temporarily unavailable
Sep 26 17:42:12.299796 imma [11005:ntfa_util.c:1554] << ntfa_update_ntfsv_state


  This issue is  is randomly observed and  not observed in the earlier release.
   


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2067 EVT : Api returns BAD_HANDLE, during middleware si-swap operation

2016-09-26 Thread Srikanth R



---

** [tickets:#2067] EVT : Api returns BAD_HANDLE, during middleware si-swap 
operation**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon Sep 26, 2016 12:44 PM UTC by Srikanth R
**Last Updated:** Mon Sep 26, 2016 12:44 PM UTC
**Owner:** nobody


Changeset : 7997 5.1.FC
Setup : 2 controllers and 2 payloads with headless feature disabled.


 Evt api returns SA_AIS_ERR_BAD_HANDLE during middleware si-swap operation.

In the below scenario, saEvtEventPublish returns BAD_HANDLE. The api is called, 
after invoking middleware switchover.

 Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event 
handle: ff84
Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch
 Sep 26 17:42:11.340907 imma [11005:eda_mds.c:0943] T1 Event Server is DOWN on 
node_id: 0
Sep 26 17:42:11.345418 imma [11005:eda_saf_api.c:2097] >> saEvtEventPublish: 
Allocated event handle: ffc00029
Sep 26 17:42:11.345457 imma [11005:eda_saf_api.c:2127] T2 Unable to retrieve 
allocated event handle: ffc00029
Sep 26 17:42:11.345471 imma [11005:eda_saf_api.c:2128] << saEvtEventPublish
Sep 26 17:42:11.347997 imma [11005:eda_saf_api.c:2364] >> saEvtEventSubscribe: 
channel handle: ffd00021
Sep 26 17:42:11.348081 imma [11005:eda_saf_api.c:2447] T2 event server is not 
yet up
Sep 26 17:42:11.348121 imma [11005:eda_saf_api.c:2448] << saEvtEventSubscribe
Sep 26 17:42:11.861956 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event 
handle: ff84
Sep 26 17:42:11.862080 imma [11005:eda_saf_api.c:0363] << saEvtDispatch
Sep 26 17:42:12.299715 imma [11005:ntfa_mds.c:0388] T2 NTFA Rcvd MDS subscribe 
evt from svc 28
Sep 26 17:42:12.299736 imma [11005:ntfa_mds.c:0398] TR NTFS down
Sep 26 17:42:12.299773 imma [11005:ntfa_util.c:1499] >> ntfa_update_ntfsv_state
Sep 26 17:42:12.299782 imma [11005:ntfa_util.c:1501] T1 Current state: 4, 
Changed state: 2
Sep 26 17:42:12.299790 imma [11005:ntfa_util.c:1542] TR Active NTF server 
temporarily unavailable
Sep 26 17:42:12.299796 imma [11005:ntfa_util.c:1554] << ntfa_update_ntfsv_state


  This issue is  is randomly observed and  not observed in the earlier release.
   


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2042 EVT : Application segfaulted in MDS callback processing

2016-09-16 Thread Srikanth R
- **summary**: EVT : Application segfaulted during  --> EVT : Application 
segfaulted in MDS callback processing



---

** [tickets:#2042] EVT : Application segfaulted in MDS callback processing**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri Sep 16, 2016 12:22 PM UTC by Srikanth R
**Last Updated:** Fri Sep 16, 2016 12:22 PM UTC
**Owner:** nobody
**Attachments:**

- [eda_bt](https://sourceforge.net/p/opensaf/tickets/2042/attachment/eda_bt) 
(59.5 kB; application/octet-stream)


Setup : 7997 5.1.FC 

Issue :
 Application segfaulted on payload in MDS  callback processing  by EVT thread.
 Below is the backtrace.
 
 0  0x7ff2f5282d64 in ncs_decode_32bit (stream=0x7ff2f6b95c98) at 
hj_dec.c:197
1  0x7ff2f5f181e4 in eda_mds_dec (info=0x7ff2f6b95dd0) at eda_mds.c:1285
2  0x7ff2f5f185fa in eda_mds_callback (info=0x7ff2f6b95dd0) at 
eda_mds.c:1440
3  0x7ff2f52b887b in mds_mcm_do_decode_full_or_flat (svccb=0x639c40, 
cbinfo=0x7ff2f6b95dd0, recv_msg=0x7aace8, orig_msg=0x0) at mds_c_sndrcv.c:4915
4  0x7ff2f52b7841 in mds_mcm_process_recv_snd_msg_common (svccb=0x639c40, 
recv=0x7aace8) at mds_c_sndrcv.c:4255
5  0x7ff2f52b7f24 in mcm_recv_normal_snd (svccb=0x639c40, recv=0x7aace8) at 
mds_c_sndrcv.c:4389
6  0x7ff2f52b7305 in mds_mcm_ll_data_rcv (recv=0x7aace8) at 
mds_c_sndrcv.c:4067
7  0x7ff2f52a54ac in mdtm_process_recv_message_common (flag=0 '\000', 
buffer=0x61424a "\252", len=167, transport_adest=72075191086465088, 
seq_num_check=30108, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:505
8  0x7ff2f52a626f in mdtm_process_recv_data (buffer=0x614242 "", len=175, 
transport_adest=72075191086465088, buff_dump=0x7ff2f6b961bc) at 
mds_dt_common.c:949
9  0x7ff2f52c952f in mdtm_process_recv_events () at mds_dt_tipc.c:793
10 0x7ff2f586c7b6 in start_thread () from /lib64/libpthread.so.0
11 0x7ff2f55c89cd in clone () from /lib64/libc.so.6

The entire backtrace is attached as an attachment. This issue is observed in 
earlier releases also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2042 EVT : Application segfaulted during

2016-09-16 Thread Srikanth R



---

** [tickets:#2042] EVT : Application segfaulted during **

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri Sep 16, 2016 12:22 PM UTC by Srikanth R
**Last Updated:** Fri Sep 16, 2016 12:22 PM UTC
**Owner:** nobody
**Attachments:**

- [eda_bt](https://sourceforge.net/p/opensaf/tickets/2042/attachment/eda_bt) 
(59.5 kB; application/octet-stream)


Setup : 7997 5.1.FC 

Issue :
 Application segfaulted on payload in MDS  callback processing  by EVT thread.
 Below is the backtrace.
 
 0  0x7ff2f5282d64 in ncs_decode_32bit (stream=0x7ff2f6b95c98) at 
hj_dec.c:197
1  0x7ff2f5f181e4 in eda_mds_dec (info=0x7ff2f6b95dd0) at eda_mds.c:1285
2  0x7ff2f5f185fa in eda_mds_callback (info=0x7ff2f6b95dd0) at 
eda_mds.c:1440
3  0x7ff2f52b887b in mds_mcm_do_decode_full_or_flat (svccb=0x639c40, 
cbinfo=0x7ff2f6b95dd0, recv_msg=0x7aace8, orig_msg=0x0) at mds_c_sndrcv.c:4915
4  0x7ff2f52b7841 in mds_mcm_process_recv_snd_msg_common (svccb=0x639c40, 
recv=0x7aace8) at mds_c_sndrcv.c:4255
5  0x7ff2f52b7f24 in mcm_recv_normal_snd (svccb=0x639c40, recv=0x7aace8) at 
mds_c_sndrcv.c:4389
6  0x7ff2f52b7305 in mds_mcm_ll_data_rcv (recv=0x7aace8) at 
mds_c_sndrcv.c:4067
7  0x7ff2f52a54ac in mdtm_process_recv_message_common (flag=0 '\000', 
buffer=0x61424a "\252", len=167, transport_adest=72075191086465088, 
seq_num_check=30108, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:505
8  0x7ff2f52a626f in mdtm_process_recv_data (buffer=0x614242 "", len=175, 
transport_adest=72075191086465088, buff_dump=0x7ff2f6b961bc) at 
mds_dt_common.c:949
9  0x7ff2f52c952f in mdtm_process_recv_events () at mds_dt_tipc.c:793
10 0x7ff2f586c7b6 in start_thread () from /lib64/libpthread.so.0
11 0x7ff2f55c89cd in clone () from /lib64/libc.so.6

The entire backtrace is attached as an attachment. This issue is observed in 
earlier releases also.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1486 smf : SMFD asserted in csi active callback during switchovers ( ncs_sel_obj_create: socketpair failed )

2016-09-16 Thread Srikanth R
- **summary**: SMFD faulted in active callback during switchovers --> smf : 
SMFD asserted in  csi active callback during switchovers ( ncs_sel_obj_create: 
socketpair failed )
- **Component**: unknown --> smf



---

** [tickets:#1486] smf : SMFD asserted in  csi active callback during 
switchovers ( ncs_sel_obj_create: socketpair failed )**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Wed Sep 16, 2015 10:04 AM UTC by Ritu Raj
**Last Updated:** Wed May 04, 2016 07:27 PM UTC
**Owner:** nobody


Setup
4.6GA with changeset 6490
4 nodes(OEL6.4 with TIPC version 1.7.7) configured with no PBE configured 

Issues Observed:
> Cluser went for reboot during switchover as SMFD faulted due to 
'csiSetcallbackFailed'

Steps Performed:

 * Continuous switchovers are invoked on the setup.
 * After a count of over 1000 switchovers, Standby Controller (SC-2) got 
rebooted when it is being promoted to ACTIVE state , as SMFD failed in active 
callback.

Sep 16 06:25:00 SLOT-2 osafsmfd[1926]: ER amf_active_state_handler oi activate 
FAIL
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: NO 
'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 
'csiSetcallbackFailed' : Recovery is 'nodeFailfast'
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: ER 
safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackFailed Recovery is:nodeFailfast
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60


* After SC-2 went for reboot, SC-1 tried to become active, during which smfd 
also faulted on the new promoted back active controller.

Sep 16 06:25:00 SLOT-1 root: Invoking switchover from invoke_switchover.sh
Sep 16 06:25:00 SLOT-1 osafamfd[3830]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
Sep 16 06:25:00 SLOT-1 osafamfnd[3845]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 16 06:25:00 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - 
Too many open files

Sep 16 06:25:05 SLOT-1 kernel: TIPC: Resetting link <1.1.1:eth0-1.1.2:eth1>, 
peer not responding
Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost link <1.1.1:eth0-1.1.2:eth1> on 
network plane A
Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost contact with <1.1.2>
Sep 16 06:25:05 SLOT-1 osaffmd[3716]: NO Node Down event for node id 2020f:

Sep 16 06:25:06 SLOT-1 osafimmnd[3746]: NO This IMMND re-elected coord 
redundantly, failover ?
Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - 
Too many open files
Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ER immutil_saImmOiInitialize_2 fail, rc 
= 2
...
Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: ER 
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackFailed Recovery is:nodeFailfast
Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-09-15 Thread Srikanth R
- **summary**: saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover --> ckpt : saCkptCheckpointOpen api 
call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
- **Comment**:

Application output with syslog running as background process.

Sep 15 18:46:10 SYSTEST-PLD-1 kernel: [ 1204.300498] TIPC: Established link 
<1.1.3:eth3-1.1.2:eth3> on network plane A
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19001
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 4 in ImmModel
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 14 
(MsgQueueService131599) <0, 2020f>
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 15 (@safAmfService2020f) <0, 2020f>
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 16 (@OpenSafImmReplicatorB) <0, 2020f>
SYSTEST-PLD-1:/home//cpsv_fo #
***
Demonstrating Checkpoint Service Usage with a collocated Checkpoint
***
Initialising With Checkpoint Service
Sep 15 18:46:13 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file 
/home//cpsv_fo/ckpt.trace, mask=0x
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService with 
create flags
PASSED
Press  key to continue... Invoke failover
Sep 15 18:46:51 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 8 
<0, 2010f> (safEvtService)

Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704238] TIPC: Resetting link 
<1.1.3:eth3-1.1.1:eth0>, peer not responding
Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704251] TIPC: Lost link 
<1.1.3:eth3-1.1.1:eth0> on network plane A
Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704259] TIPC: Lost contact with 
<1.1.1>
...
...
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 23 
(safEvtService) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 24 (@safLogService_appl) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 25 
(safSmfService) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 26 (@OpenSafImmReplicatorA) <0, 2020f>
**Unlink My Checkpoint    Failed :5**
Ckpt Finalize being called  PASSED
SYSTEST-PLD-1:/home//cpsv_fo # Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: 
NO Implementer connected: 27 (MsgQueueService131343) <0, 2020f>
Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 27 
<0, 2020f> (MsgQueueService131343)
Sep 15 18:47:18 SYSTEST-PLD-1 kernel: [ 1272.242604] TIPC: Established link 
<1.1.3:eth3-1.1.1:eth0> on network plane A
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19001
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 5 in ImmModel
Sep 15 18:47:20 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 28 
(MsgQueueService131343) <0, 2010f>
Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 29 (@safAmfService2010f) <0, 2010f>
Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 30 (@OpenSafImmReplicatorB) <0, 2010f>

SYSTEST-PLD-1:/home//cpsv_fo # ./a.out
***
Demonstrating Checkpoint Service Usage with a collocated Checkpoint
***
Initialising With Checkpoint Service
Sep 15 18:48:24 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file 
/home//cpsv_fo/ckpt.trace, mask=0x
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService
**Ckpt open Failed (2).** Hence exiting




---

** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 15, 2016 01:27 PM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << 

[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-09-15 Thread Srikanth R
Hi Pham,

   We had applied the patch on 5.0 GA and observed that the issue is still 
observed.

Below are the steps and the apis used in the application to reproduce the issue.

Application :

-> Invoke saCkptInitialize
-> Invoke saCkptCheckpointOpen with create flag and 
SA_CKPT_WR_ACTIVE_REPLICA_WEAK.
-> Invoke saCkptCheckpointOpen with WRITE flag
-> Wait for user to press enter ( to invoke failover )
-> Invoke saCkptCheckpointUnlink
-> Invoke saCkptFinalize

Steps to reproduce the issue :

-> Initially start a single controller and payload.

-> Start the other controller, which shall join as standby.

-> Once the standby controller is joining, invoke the application on the 
payload. This is such a way that the CKPT apis shall be invoked when CKPT cold 
sync is in progress.

->  After a sleep of 20 seconds, induce middle failover and later unblock the 
application after which unlink and finalize apis shall be invoked.  

 The unlink api returns TIME_OUT and the IMM objects are not deleted from the DB
 
 immfind | grep -i Demo
safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService

 -> If this application is invoked next time, checkpoint open shall return 
SA_AIS_ERR_LIBRARY.
 
 
 -> At this stage, if the application is invoked twice, ckptd segfaults and the 
ticket #2011 is raised regarding that.

  This issue (#1765) seems to be similar as #247, which has been closed as 
non-reproducible.  Some times, checkpoint open also gets SA_AIS_ERR_RESOURCES 
as mentioned in #247. 
  
  
  -- Srikanth


Attachments:

- 
[1765.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/8ea9d424/d730/attachment/1765.tgz)
 (111.5 kB; application/x-compressed-tar)


---

** [tickets:#1765] saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Wed May 04, 2016 06:56 PM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: 
retval = 1
Apr 15  8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with 
return value:2,ckptHandle:63
Apr 15  8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: 
API return code = 2**

> Traces of both controllers and agent trace of payload is attached.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2036 build : make rpm fails, if installation directories are specified

2016-09-15 Thread Srikanth R



---

** [tickets:#2036] build : make rpm fails, if installation directories are 
specified**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:03 AM UTC by Srikanth R
**Last Updated:** Thu Sep 15, 2016 06:03 AM UTC
**Owner:** nobody


Environment : 
Setup : SLES 64bit gcc 6.1

Steps performed :

Ran the following commands after downloading the opensaf from hg.
-> ./bootstrap.sh
-> ./configure CFLAGS="-g " CXXFLAGS="-g " --enable-tipc --enable-imm-pbe 
--enable-ntf-imcn   --sysconfdir=/opt/etc  --localstatedir=/opt/var 
--libdir=/opt/usr/lib
-> make rpm

 The last step fails with the following error.
 
 
 Checking for unpackaged file(s): /usr/lib/rpm/check-files 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root
error: Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf


RPM build errors:
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/lib/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/log/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/run/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/chassis_id
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/slot_id
.
File not found by glob: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/libSa*.a
Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf
  ...
 /opt/usr/lib/pkgconfig/opensaf-smf.pc
   /opt/usr/lib/pkgconfig/opensaf.pc
make: *** [rpm] Error 1






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))

2016-09-14 Thread Srikanth R
Attaching the logs. 


Attachments:

- 
[2022.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2d0d5691/727c/attachment/2022.tgz)
 (1.1 MB; application/x-compressed-tar)


---

** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout 
- Nway model))**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R
**Last Updated:** Mon Sep 12, 2016 07:21 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4


Summary :
--
AMFD on both controllers asserted, if Nway application failed in CSI SET 
QUIESCED callback in lock operation of node group 


Steps followed & Observed behaviour
--

-> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the 
application. Configuration is attached to the ticket.
-> Created a node group with all the three nodes.
-> Ensured that one of component will not respond to quiesced callback
-> Now performed the lock operation on the node group
-> amfd on both controllers asserted with the following back trace.


0  0x7f66fbc6fb55 in raise () from /lib64/libc.so.6
1  0x7f66fbc71131 in abort () from /lib64/libc.so.6
2  0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", 
__line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f 
"saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281

3  0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022
4  0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783
5  0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714
6  0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, 
su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022
7  0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:3493
8  0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:497
9  0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525
10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, 
evt=0x7f66f4002940) at sgproc.cc:838
11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at 
main.cc:768
12 0x004508cd in main_loop () at main.cc:689
13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-14 Thread Srikanth R
If IMM has maximum limit of 2048  for the longDN object, then AMF should reject 
the creation of application objects by calculating the size of the rt objects.


---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Tue Sep 13, 2016 01:01 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13
Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13**


Below is the corresponding trace in osafamfd :


Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute
Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create 
safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy
 zabcdefghijklmnopqrstuvT
Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp
Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << 
rt_object_create_common
Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/t

[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-12 Thread Srikanth R
Attaching the configuration and  IMMD  traces also.


Attachments:

- 
[2023_longDn.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b0d730dd/21a6/attachment/2023_longDn.tgz)
 (821.2 kB; application/x-compressed)


---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Mon Sep 12, 2016 01:59 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13
Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13**


Below is the corresponding trace in osafamfd :


Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute
Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create 
safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy
 zabcdefghijklmnopqrstuvT
Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp
Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << 
rt_object_create_common
Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13




---

Sent from sourceforge.net because opensaf-tickets@

[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-10 Thread Srikanth R



---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 10:57 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13
Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13**


Below is the corresponding trace in osafamfd :


Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute
Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create 
safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy
 zabcdefghijklmnopqrstuvT
Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp
Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << 
rt_object_create_common
Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this

[tickets] [opensaf:tickets] #316 SI Assignments are not removed for a SU in Nway redundancy model

2016-09-10 Thread Srikanth R
Issue of SU struck in quiesced is observed during lock operation of node group.

-> Brought up Nway application with 3 SUs hosted on SC-2,PL-3 and PL-4.
-> Locked a node group with only PL-3 as the member
-> SU hosted on PL-3 assignments are not removed and is stuck in quiesced 
state. 

Configuration is attached.
 


Attachments:

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/0143c687/6d14/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)


---

** [tickets:#316] SI Assignments are not removed for a SU in Nway redundancy 
model**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri May 24, 2013 08:39 AM UTC by Nagendra Kumar
**Last Updated:** Tue Aug 09, 2016 09:48 AM UTC
**Owner:** Praveen
**Attachments:**

- [logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/logs.tar) 
(2.5 MB; application/x-gzip-compressed)
- [osafamfd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfd) 
(228.2 kB; application/octet-stream)
- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfnd) 
(122.8 kB; application/octet-stream)
- 
[pl_logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/pl_logs.tar)
 (1.3 MB; application/x-gzip-compressed)


Migrated from http://devel.opensaf.org/ticket/2987

changeset : 3855
Model : NWay
configuration : 1App,1SG,5SU with 3comps each, 5SIs with 3csi each.
si-si deps configured as SI1<-SI2<-SI3<-SI4
SIrankedSus not configured. 
Node mapping : SU1 on SC-1, SU2 on SC-2, SU3 on PL-3, SU4,SU5 on PL-4.


While running the campaign, smf performs lock,lock-in of the activation units 
i.e SUs. The SIs for SU3 are not removed though SU3 is in locked-state. 
Subsequent unlock-in,unlock of SU3 fails. 


/var/log/messages of active ctrl- SC-1 shows

Feb 3 22:45:14 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:16 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:18 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:20 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:23 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Fail to invoke admin operation, 
too many SA_AIS_ERR_TRY_AGAIN, giving up. 
dn=[safSu=SU3,safSg=SGONE,safApp=NWAYAPP], opId=[3]
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to call admin operation 3 
on safSu=SU3,safSg=SGONE,safApp=NWAYAPP
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to Terminate activation 
units in step=safSmfStep=0003
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step undoing failed
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step safSmfStep=0003 in procedure 
safSmfProc=amfClusterProc-1 failed, step result 5
Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: NO CAMP: Procedure 
safSmfProc=amfClusterProc-1 returned FAILED


SU Assignments brief:
===
safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI3,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI2,safApp=NWAYAPP


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=SU3\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP


saAmfSISUHAState=QUIESCED(3)


safSISU=safSu=SU4\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SU2\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI1,safApp=NWAYAPP


saAmfSISUHAState=ACTIVE(1)


SU States:
==
safSu=SU3,safSg=SGONE,safApp=NWAYAPP


saAmfSUAdminState=LOCKED(2)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)


changed 4 months ago by bertil ¶
  ■owner changed from ingber to ravisekhar 
■component changed from saf/smfsv to saf/avsv 
I beleave this is an AMF problem. SMF only uses the AMF admin ops (lock, unlock 
etc).






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))

2016-09-10 Thread Srikanth R



---

** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout 
- Nway model))**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 09:58 AM UTC
**Owner:** nobody
**Attachments:**

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4


Summary :
--
AMFD on both controllers asserted, if Nway application failed in CSI SET 
QUIESCED callback in lock operation of node group 


Steps followed & Observed behaviour
--

-> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the 
application. Configuration is attached to the ticket.
-> Created a node group with all the three nodes.
-> Ensured that one of component will not respond to quiesced callback
-> Now performed the lock operation on the node group
-> amfd on both controllers asserted with the following back trace.


0  0x7f66fbc6fb55 in raise () from /lib64/libc.so.6
1  0x7f66fbc71131 in abort () from /lib64/libc.so.6
2  0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", 
__line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f 
"saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281

3  0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022
4  0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783
5  0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714
6  0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, 
su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022
7  0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:3493
8  0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:497
9  0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525
10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, 
evt=0x7f66f4002940) at sgproc.cc:838
11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at 
main.cc:768
12 0x004508cd in main_loop () at main.cc:689
13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2021 AMF : active compname is improperly populated in Standby callback (NPM)

2016-09-10 Thread Srikanth R



---

** [tickets:#2021] AMF :  active compname is improperly populated in Standby 
callback (NPM)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 06:52 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 06:52 AM UTC
**Owner:** nobody


 For an application with NPM model, active compName in the standby descriptor 
is having corrupted value in the standby callback.


Breakpoint 1, pycbk_SaAmfCSISetCallbackT (invocation=4287627278, 
compName=0x941a28, haState=SA_AMF_HA_STANDBY, csiDescriptor=...) at 
saAmf_wrap.c:2914
2914saAmf_wrap.c: No such file or directory.
(gdb) p csiDescriptor 
$1 = {csiFlags = 1, csiName = {length = 48, value = 
"safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_Npm", '\000' }, csiStateDescriptor = {activeDescriptor = {transitionDescriptor = 
1634926660, 
  activeCompName = {length = 0, value = 
"\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm",
 '\000' }}, standbyDescriptor = {activeCompName = {
length = 68, value = 
"**sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm**",
 '\000' }, standbyRank = 0}}, csiAttr = {attr = 0x7642a0, 
number = 1}}


 In the above callback ( in gdb ), the  active component name in standby 
descriptor in standby callback  should be 
safComp=COMP1,safSu=TestApp_SU3,safSg=TestApp_SG1,safApp=TestApp_Npm, but it  
is populated with improper value :
 
sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npmapo


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2020 AMF : Additional features for csiAttributeChangeCallback

2016-09-09 Thread Srikanth R



---

** [tickets:#2020] AMF : Additional features for csiAttributeChangeCallback **

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 05:53 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 05:53 AM UTC
**Owner:** nobody


The following features can be considered additionally for 
csiAttributeChangeCallback implementation.

-> Currently both active and standby receives csiAttributeChangeCallback 
simultaneously. But csiAttributeChangeCallback should be handled in a way like 
csiSet callback.  Initially Component with active assignment should receive the 
callback and later the standby should receive.

   There might be scenario in user application that standby shall try to access 
an object, which is associated with a CSI and should be created by active. If 
both the components simultaneously gets callback, then standby may behave 
erroneoulsy if it processes the callback before  a busy active  component 
processes the callback.
   
   
 ->  Currnelty, the csiAttributeChangeCallback is invoked only when values are 
added to existing csi attrib class. But if a new csi attribute class is 
created, callback is not invoked. 
 
   Callback should be invoked for every modification of csi attrib objects. All 
the operations create, modify and delete should be supported. 





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #326 amf: proxied SU's presence state hangs at INSTANTIATING state.

2016-09-09 Thread Srikanth R
Even for failure in during csi attribute change callback timeout, proxied SU 
got struck in INSTANTIATING state.


Sep  9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 
'safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' 
recovery action escalated from 'componentRestart' to 'suFailover'
Sep  9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 
'safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' 
faulted due to 'csiAttributeChangeCallbackTimeout' : Recovery is 'suFailover'
Sep  9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO Terminating components of 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'(abruptly & 
unordered)
Sep  9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State 
INSTANTIATED => TERMINATING
Sep  9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State 
TERMINATING => TERMINATING
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State 
TERMINATING => UNINSTANTIATED
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Terminated all components in 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Informing director of sufailover
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Repair request for 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State 
UNINSTANTIATED => UNINSTANTIATED
Sep  9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State 
UNINSTANTIATED => INSTANTIATING



---

** [tickets:#326] amf: proxied SU's presence state hangs at INSTANTIATING 
state.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri May 24, 2013 09:34 AM UTC by Praveen
**Last Updated:** Wed May 04, 2016 07:20 PM UTC
**Owner:** nobody


Migrated from http://devel.opensaf.org/ticket/2213.

setup: 1 controller
 Model observed: TwoN
 

Configuration of proxy : 1 App, 1SG, 1SU, 1 proxy comps
 Configuration of proxied : 1App, 1SG, 1SU, 1 proxied component with 
saAmfCtCompCategory=12 


The proxy code is modelled to respond to amf with ERR_FAILED_OP inside 
SaAmfProxiedComponentInstantiateCallback?() api
 

By default, the SU's of proxy and proxied are in locked-instantiation state. 


Scenario:
 



Bringup the proxy and proxied configuration. 
Do unlock-in and unlock of the proxy. The proxy should be up and running, and 
the proxied registration should be successful. 


Now do unlock-in of proxied SU. The below is the console output 
console text:
 amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App
 error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
 

Retrying again gives the below output. 
SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in 
safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App
 error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN 
(6)
 SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in 
safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App
 error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN 
(6)
 

/var/log/messages output for above op's:
 Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: 
saAmfCtDefQuiescingCompleteTimeout for 
'safVersion=4.0.0,safCompType=Comp_nored' initialized with 
saAmfCtDefCallbackTimeout
 Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: 
'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State 
UNINSTANTIATED => INSTANTIATING
 Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: 
'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State INSTANTIATING 
=> INSTANTIATED
 Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: 
saAmfCtDefQuiescingCompleteTimeout for 
'safVersion=4.0.0,safCompType=Comp_pxd_basetype' initialized with 
saAmfCtDefCallbackTimeout
 Oct 11 15:13:41 SLES11-SLOT-2 osafamfnd[3852]: 
'safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App' Presence State UNINSTANTIATED => 
INSTANTIATING
 Oct 11 15:15:55 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going
 Oct 11 15:15:58 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going
 

SU states of proxy and proxied:
 safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp
 saAmfSUAdminState=UNLOCKED(1)
 saAmfSUOperState=ENABLED(1)
 saAmfSUPresenceState=INSTANTIATED(3)
 saAmfSUReadinessState=IN-SERVICE(2)
 

safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App
 saAmfSUAdminState=LOCKED(2)
 saAmfSUOperState=ENABLED(1)
 saAmfSUPresenceState=INSTANTIATING(2)
 saAmfSUReadinessState=OUT-OF-SERVICE(1)
 

Comp state of proxy and proxied:
 safComp=mycomp,safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp
 saAmfCompOperState=ENABLED(1)
 saAmfCompPresenceState=INSTANTIATED(3)
 

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R
-> In addition to the steps mentioned in the ticket, for the below operations 
following message is printed in syslog.



Sep  8 12:06:29 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:35 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:45 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:55 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12


 Below are the steps.
 
 -> Delete all the application objects.
 -> Perform the middleware switchover / failover. 
 -> New active controller is trying to access the application SI object which 
is already deleted earlier.
 
 
 Sep  8 12:08:36.647738 osafamfd [:main.cc:0810] << process_event
Sep  8 12:08:36.647743 osafamfd [:imm.cc:0396] >> execute
Sep  8 12:08:36.647748 osafamfd [:imm.cc:0142] >> exec: Create 
safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_TwoN
Sep  8 12:08:36.647754 osafamfd [:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep  8 12:08:36.647761 osafamfd [:imma_oi_api.c:2892] TR attr:safCSIComp
Sep  8 12:08:36.647768 osafamfd [:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep  8 12:08:36.647795 osafamfd [:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep  8 12:08:36.650289 osafamfd [:imma_oi_api.c:3063] << 
rt_object_create_common
Sep  8 12:08:36.650330 osafamfd [:imm.cc:0163] ER exec: create FAILED 12



---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:09 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R
amfd traces on both the controllers


Attachments:

- 
[2009.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98b72c10/7108/attachment/2009.tgz)
 (849.1 kB; application/x-compressed-tar)


---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:07 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R



---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:07 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream

2016-09-06 Thread Srikanth R
- **summary**: LOG : ntfd  on active controller crashed while logging to alarm 
stream --> osafntfd on active controller crashed while logging to alarm stream
- **Component**: log --> ntf
- **Comment**:

After the integration of LOG with CLM (#1638), all LOG clients should 
reinitialize after CLM unlock operation.  It might be that , NTF as a LOG 
client is not reinitializing after CLM unlock and got the return value 31.  



---

** [tickets:#1999] osafntfd on active controller crashed while logging to alarm 
stream**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 08:09 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4

Summary :
--
NTFD crashed on active controller, while logging notification to alarm stream.


Steps followed & Observed behaviour
--
 -> Initially performed couple of switchovers and tests on AMF application.
 -> Performed CLM lock operation of standby SC-1 and later unlocked.
 -> Performed switchover such that SC-1 became active controller.
 -> Stopped opensafd on PL-4. NTFD on active controller crashed.
 
Sep  6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster
..
Sep  6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 
0x414d1e with errno=11
Sep  6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> Below is the excerpt from the ntfd trace.

Sep  6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification 
received, id: 682
Sep  6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification
Sep  6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 
0x685790, notId: 682
Sep  6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 
with type 16384 added, notificationMap size is 1
Sep  6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log
Sep  6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 
received in logger with size 0
Sep  6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging
Sep  6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog
Sep  6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification
Sep  6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging 
notification to alarm stream
Sep  6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync
Sep  6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record
Sep  6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record
Sep  6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync 
Node not CLM member or stale client**
Sep  6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync
Sep  6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter

2016-09-06 Thread Srikanth R



---

** [tickets:#2002] CLM : Agent crashed for invalid check in buffer notification 
parameter**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 08:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 08:15 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4



Steps followed & Observed behaviour
--

-> Call saClmClusterTrack_4 api with CURRENT flag and buffer parameter 
populated.  Here the buffer paramter is populated by allocating suffiicent 
memory of numberOfItems but notification is having garbage values.

Agent crashed with the following back trace, if notification is having garbage 
values.

 -> #3  0x7f4ccb370c9f in osaf_extended_name_length (name=0x9d5e4e) at 
osaf_extended_name.c:139
-> #4  0x7f4cca9ff27c in clma_validate_flags_buf_4 (hdl_rec=0x97cbc0, 
flags=1 '\001', buf=0x97c190) at clma_api.c:183
->#5  0x7f4ccaa00fe5 in clmaclustertrack (clmHandle=4290772993, flags=1 
'\001', buf=0x0, buf_4=0x97c190) at clma_api.c:1032
->#6  0x7f4ccaa00d40 in saClmClusterTrack_4 (clmHandle=4290772993, flags=1 
'\001', buf=0x97c190) at clma_api.c:958


Expected behaviour
--
If the buffer parameter is NULL, CLM shall invoke a callback. If the buffer 
parameter is not NULL, CLM should check only value of numberOfItems  and 
evaluate whether sufficient memory is allocated by user or not.  

With the #1906 changes, contents of notification are also verified.  But only 
structure member numberOfItems  is to be verified.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1999 LOG : ntfd on active controller crashed while logging to alarm stream

2016-09-05 Thread Srikanth R



---

** [tickets:#1999] LOG : ntfd  on active controller crashed while logging to 
alarm stream**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 05:15 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4

Summary :
--
NTFD crashed on active controller, while logging notification to alarm stream.


Steps followed & Observed behaviour
--
 -> Initially performed couple of switchovers and tests on AMF application.
 -> Performed CLM lock operation of standby SC-1 and later unlocked.
 -> Performed switchover such that SC-1 became active controller.
 -> Stopped opensafd on PL-4. NTFD on active controller crashed.
 
Sep  6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster
..
Sep  6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 
0x414d1e with errno=11
Sep  6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> Below is the excerpt from the ntfd trace.

Sep  6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification 
received, id: 682
Sep  6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification
Sep  6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 
0x685790, notId: 682
Sep  6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 
with type 16384 added, notificationMap size is 1
Sep  6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log
Sep  6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 
received in logger with size 0
Sep  6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging
Sep  6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog
Sep  6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification
Sep  6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging 
notification to alarm stream
Sep  6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync
Sep  6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record
Sep  6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record
Sep  6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync 
Node not CLM member or stale client**
Sep  6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync
Sep  6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1995 AMF : amfd crashed while dumping AMF state

2016-09-02 Thread Srikanth R



---

** [tickets:#1995] AMF : amfd crashed while dumping AMF state**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Fri Sep 02, 2016 08:42 AM UTC by Srikanth R
**Last Updated:** Fri Sep 02, 2016 08:42 AM UTC
**Owner:** nobody


Changeset : 7997 5.1 FC

AMFD crashed while dumping the amf state, with the following command.

 immadm -a @safAmfService2020f -o 99 @safAmfService2020f
 
 
 Sep  2 12:51:26 CONTROLLER-2 osafamfd[2691]: NO unknown type: 
@safAmfService2020f
Sep  2 12:51:26 CONTROLLER-2 osafamfd[2691]: imm.cc:648: 
object_name_to_class_type: Assertion 'false' failed.
Sep  2 12:51:26 CONTROLLER-2 osafamfnd[2701]: WA AMF director unexpectedly 
crashed
Sep  2 12:51:26 CONTROLLER-2 osafamfnd[2701]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag

2016-08-31 Thread Srikanth R



---

** [tickets:#1991] AMF: Existing PG tracking should not be stopped  for CURRENT 
flag**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R
**Last Updated:** Wed Aug 31, 2016 09:44 AM UTC
**Owner:** nobody


5.1.FC : changeset - 6997

Issue : Existing PG tracking should not be stopped  for CURRENT call


Steps performed :

-> Call saAmfInitialize_4()
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrackStop()


Observed output :

TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started 
earlier. 


Expected output:

   TrackStop() api should  return SA_AIS_OK and in the earlier release, api is 
returning SA_AIS_OK.
 
 According to the B04.01 spec 7.11.1 page 318 ,  Tracking should not be stopped 
untill TrackStop() is called explicitly.

Once saAmfProtectionGroupTrack_4() has been called with trackFlags
containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification
callbacks can only be stopped by an invocation of
saAmfProtectionGroupTrackStop().



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.

2016-08-31 Thread Srikanth R



---

** [tickets:#1990] AMF :  Extra notification is received for lock operation on 
unlocked SG.**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R
**Last Updated:** Wed Aug 31, 2016 06:40 AM UTC
**Owner:** nobody


Changeset : 5.1 FC (7997 changeset)

 Extra notification is received for lock operation on unlocked SG.
 
 amf-adm lock safSg=AmfDemo,safApp=AmfDemo
===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_UNLOCKED
New State: SA_AMF_ADMIN_LOCKED

===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_LOCKED
New State: SA_AMF_ADMIN_LOCKED



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1926 pyosaf: utils/ntf fails to set additional text

2016-08-29 Thread Srikanth R
- **status**: review --> fixed
- **Milestone**: 5.0.1 --> 5.1.FC
- **Comment**:

changeset:   7968:7e5ae40512d1
tag: tip
user:Johan Mårtensson 
date:Mon Aug 29 17:02:21 2016 +0530
summary: pyosaf: Fix handling of additionalText field in notification 
headers [#1926]





---

** [tickets:#1926] pyosaf: utils/ntf fails to set additional text**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Thu Jul 21, 2016 08:08 AM UTC by Johan Mårtensson
**Last Updated:** Thu Jul 21, 2016 08:37 AM UTC
**Owner:** Johan Mårtensson


Additional text is not set correctly in the notification header. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1890 Doc : Headless feature documentation

2016-06-21 Thread Srikanth R



---

** [tickets:#1890] Doc : Headless feature documentation **

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Tue Jun 21, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Tue Jun 21, 2016 11:02 AM UTC
**Owner:** nobody


Version : Opensaf 5.0. GA


 1) Documentation about headless feature should be updated in 
Opensaf_Overview_PR.odt / Opensaf_Extentsions. The documentation should list 
out services which provide functionality, when the cluster goes headless.
  
 2) The  README.HYDRA file in the ntfsv folder should be renamed to 
README.HEADLESS for uniformity in naming the files across all the folders.
 
 3) CLM folder doesn't have README for the headless feature.
 
 4) The headless files across all folders should have same naming convention.

./osaf/services/saf/amf/README_HEADLESS
./osaf/services/saf/logsv/README-HEADLESS
./osaf/services/saf/cpsv/README.HEADLESS





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

2016-06-20 Thread Srikanth R
For a fault during headless, AMF is leaving the application in the same state 
with the following update in syslog on SU hosted payload.

Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI1,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI2,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI3,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI4,safApp=TestApp_TwoN


  In the above situation, application with active assignment faulted during 
headless and node went for reboot. Once the controller joins , the above syslog 
is printed and the application is left  with ONLY standby assignment.
  
   If AMF application is left with improper assignments  and this ticket is 
targeting the above scenario and others like #1869, then this ticket should be 
marked as **defect**. 


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** accepted
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Thu May 05, 2016 12:22 PM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, then 
cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
(2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
(2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen

At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1886 CLM : Initialize API returns 31, once controllers join back from headless

2016-06-20 Thread Srikanth R



---

** [tickets:#1886] CLM : Initialize API returns 31, once controllers  join back 
from headless**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Mon Jun 20, 2016 10:32 AM UTC by Srikanth R
**Last Updated:** Mon Jun 20, 2016 10:32 AM UTC
**Owner:** nobody
**Attachments:**

- 
[clmd_1888](https://sourceforge.net/p/opensaf/tickets/1886/attachment/clmd_1888)
 (490.9 kB; application/octet-stream)


setup:
Version - opensaf 5.0.GA
5-Node cluster( 3 controllers and  PL:4,PL-5 Payloads)

Create headless scenario and call saClmInitialize_4 api on a healthy payload 
PL-5.

The saClmInitialize_4 should return TRY_AGAIN untill the controllers are up.

In some cases, 31 return code is returned. This is observed 2 out of 5 times.


MBCSV:MBCA:OFF
Aug  2 20:56:58.379408 clma [13326:clma_mds.c:1124] >> clma_mds_init
Aug  2 20:56:58.379582 clma [13326:clma_mds.c:1170] << clma_mds_init
Aug  2 20:57:26.122341 clma [13326:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe 
evt from svc 34
Aug  2 20:57:26.122361 clma [13326:clma_mds.c:0978] T2 MSG from CLMS 
NCSMDS_NEW_ACTIVE/UP
Aug  2 20:57:26.123651 clma [13326:clma_util.c:0120] << clma_startup: rc: 1, 
clma_use_count: 1
Aug  2 20:57:26.123665 clma [13326:clma_mds.c:1227] >> clma_mds_msg_sync_send
Aug  2 20:57:26.123704 clma [13326:clma_mds.c:0317] >> clma_mds_enc
Aug  2 20:57:26.123717 clma [13326:clma_mds.c:0352] T2 msgtype: 0
Aug  2 20:57:26.123723 clma [13326:clma_mds.c:0366] T2 api_info.type: 0
Aug  2 20:57:26.123729 clma [13326:clma_mds.c:0045] >> clma_enc_initialize_msg
Aug  2 20:57:26.123735 clma [13326:clma_mds.c:0060] << clma_enc_initialize_msg
Aug  2 20:57:26.123742 clma [13326:clma_mds.c:0407] << clma_mds_enc
Aug  2 20:57:26.152653 clma [13326:clma_mds.c:0697] >> clma_mds_dec
Aug  2 20:57:26.152674 clma [13326:clma_mds.c:0729] T2 CLMSV_CLMA_API_RESP_MSG 
rc = 31
Aug  2 20:57:26.152682 clma [13326:clma_mds.c:0809] << clma_mds_dec
Aug  2 20:57:26.152717 clma [13326:clma_mds.c:1253] << clma_mds_msg_sync_send
Aug  2 20:57:26.152728 clma [13326:clma_api.c:0636] TR CLMS return FAILED
Aug  2 20:57:26.152752 clma [13326:clma_util.c:0656] >> clma_msg_destroy
Aug  2 20:57:26.153200 clma [13326:clma_util.c:0680] << clma_msg_destroy
Aug  2 20:57:26.153219 clma [13326:clma_api.c:0663] T2 CLMA INIT FAILED
Aug  2 20:57:26.153226 clma [13326:clma_util.c:0133] >> clma_shutdown: 
clma_use_count: 1
Aug  2 20:57:26.153232 clma [13326:clma_mds.c:1190] >> clma_mds_finalize
Aug  2 20:57:26.153412 clma [13326:clma_mds.c:1203] << clma_mds_finalize
Aug  2 20:57:26.153580 clma [13326:sysf_def.c:0153] TR DESTROYING LEAP 
ENVIRONMENT
Aug  2 20:57:26.153663 clma [13326:sysf_def.c:0170] TR DONE DESTROYING LEAP 
ENVIRONMENT
Aug  2 20:57:26.153679 clma [13326:clma_util.c:0146] << clma_shutdown: rc: 1, 
clma_use_count: 0
Aug  2 20:57:26.153686 clma [13326:clma_api.c:0668] << clmainitialize
Aug  2 20:57:26.153692 clma [13326:clma_api.c:0580] << saClmInitialize_4


CLM director trace on new active controller is attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1885 CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless

2016-06-20 Thread Srikanth R
CLM agent trace


Attachments:

- 
[clma_agent.txt](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b0de1c47/b115/attachment/clma_agent.txt)
 (10.5 kB; text/plain)


---

** [tickets:#1885] CLM : LIbrary gives false success for couple of APIs , once 
controller joins back from headless**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Mon Jun 20, 2016 09:17 AM UTC by Srikanth R
**Last Updated:** Mon Jun 20, 2016 09:17 AM UTC
**Owner:** nobody


Setup : 
5 nodes setup with 3 controllers.
Version : opensaf 5.0 GA


Steps performed :

-> Invoke saClmInitialize_4
-> Create a thread by calling saClmDispatch with DISPATCH_BLOCKING as argument.
-> Invoke saClmClusterNodeGet_4 
-> Create headless state.
-> Invoke saClmClusterTrack_4 with TRACK_CURRENT and TRACK_START_STEP
-> Invoke saClmClusterNodeGet_4 

Observed behavior :


 The first three apis successfully returned SA_AIS_OK. 
 
 Once the headless scenario is induced, saClmClusterTrack_4 api returned 
TRY_AGAIN until one of the controller joined as active controller. Here the api 
returned SA_AIS_OK, but  no callback with CURRENT nodes info is delivered. 

 The thread in which Dispatch is called, returned with SA_AIS_OK. 
 
Even though internally, the handle is marked as BAD_HANDLE. The subsequent 
calls to saClmClusterTrack and saClmClusterNodeGet_4  returned successfully.


Aug  2 19:36:22.990719 clma [10058:clma_api.c:1035] TR RC before give handle 
flagsTrack 6
Aug  2 19:36:22.990730 clma [10058:clma_api.c:1038] << clmaclustertrack
Aug  2 19:36:22.990740 clma [10058:clma_api.c:0938] << saClmClusterTrack_4
Aug  2 19:36:26.998572 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4
Aug  2 19:36:26.998625 clma [10058:clma_api.c:0968] >> clmaclustertrack
Aug  2 19:36:26.998636 clma [10058:clma_api.c:0986] TR CLMS down
Aug  2 19:36:26.998642 clma [10058:clma_api.c:1035] TR RC before give handle 
flagsTrack 6
Aug  2 19:36:26.998648 clma [10058:clma_api.c:1038] << clmaclustertrack
Aug  2 19:36:26.998657 clma [10058:clma_api.c:0938] << saClmClusterTrack_4
Aug  2 19:36:30.837965 clma [10058:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe 
evt from svc 34
Aug  2 19:36:30.837983 clma [10058:clma_mds.c:0978] T2 MSG from CLMS 
NCSMDS_NEW_ACTIVE/UP
Aug  2 19:36:30.837989 clma [10058:clma_mds.c:0989] TR ** Marking handle as 
BAD**
Aug  2 19:36:30.839058 clma [10058:sysf_ipc.c:0363] TR IN LEAP_DBG_SINK
Aug  2 19:36:30.839070 clma [10058:clma_util.c:0625] << clma_hdl_cbk_dispatch
Aug  2 19:36:30.839076 clma [10058:clma_api.c:0793] << saClmDispatch
Aug  2 19:36:31.259065 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4
Aug  2 19:36:31.259088 clma [10058:clma_api.c:0968] >> clmaclustertrack
Aug  2 19:36:31.259097 clma [10058:clma_util.c:0036] >> clma_validate_version
Aug  2 19:36:31.259103 clma [10058:clma_util.c:0042] << clma_validate_version
Aug  2 19:36:31.259108 clma [10058:clma_api.c:1009] TR B.4.1 version
Aug  2 19:36:31.259113 clma [10058:clma_api.c:0140] >> 
clma_validate_flags_buf_4: flags=0x15
Aug  2 19:36:31.259118 clma [10058:clma_api.c:0176] << clma_validate_flags_buf_4
Aug  2 19:36:31.259124 clma [10058:clma_api.c:1020] TR RC after validate 
flagsTrack 1
Aug  2 19:36:31.259129 clma [10058:clma_util.c:0036] >> clma_validate_version
Aug  2 19:36:31.259140 clma [10058:clma_util.c:0042] << clma_validate_version
Aug  2 19:36:31.259145 clma [10058:clma_mds.c:1274] >> clma_mds_msg_async_send
Aug  2 19:36:31.259158 clma [10058:clma_mds.c:0317] >> clma_mds_enc
Aug  2 19:36:31.259166 clma [10058:clma_mds.c:0352] T2 msgtype: 0
Aug  2 19:36:31.259171 clma [10058:clma_mds.c:0366] T2 api_info.type: 2
Aug  2 19:36:31.259177 clma [10058:clma_mds.c:0118] >> clma_enc_track_start_msg
Aug  2 19:36:31.259182 clma [10058:clma_mds.c:0134] << clma_enc_track_start_msg
Aug  2 19:36:31.259187 clma [10058:clma_mds.c:0407] << clma_mds_enc
Aug  2 19:36:31.259260 clma [10058:clma_mds.c:1296] << clma_mds_msg_async_send
Aug  2 19:36:31.259272 clma [10058:clma_api.c:0455] << clma_send_md

 If Dispatch api is called once AGAIN after the controller joins , BAD_HANDLE 
is returned.
 
 
 Expected behavior :
 
  If the handle is marked as BAD internally, the apis saClmClusterTrack_4 and 
saClmClusterNodeGet_4 should also return BAD_HANDLE once the controller joins 
back. Currently Dispatch returns BAD_HANDLE


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an inte

[tickets] [opensaf:tickets] #1885 CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless

2016-06-20 Thread Srikanth R



---

** [tickets:#1885] CLM : LIbrary gives false success for couple of APIs , once 
controller joins back from headless**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Mon Jun 20, 2016 09:17 AM UTC by Srikanth R
**Last Updated:** Mon Jun 20, 2016 09:17 AM UTC
**Owner:** nobody


Setup : 
5 nodes setup with 3 controllers.
Version : opensaf 5.0 GA


Steps performed :

-> Invoke saClmInitialize_4
-> Create a thread by calling saClmDispatch with DISPATCH_BLOCKING as argument.
-> Invoke saClmClusterNodeGet_4 
-> Create headless state.
-> Invoke saClmClusterTrack_4 with TRACK_CURRENT and TRACK_START_STEP
-> Invoke saClmClusterNodeGet_4 

Observed behavior :


 The first three apis successfully returned SA_AIS_OK. 
 
 Once the headless scenario is induced, saClmClusterTrack_4 api returned 
TRY_AGAIN until one of the controller joined as active controller. Here the api 
returned SA_AIS_OK, but  no callback with CURRENT nodes info is delivered. 

 The thread in which Dispatch is called, returned with SA_AIS_OK. 
 
Even though internally, the handle is marked as BAD_HANDLE. The subsequent 
calls to saClmClusterTrack and saClmClusterNodeGet_4  returned successfully.


Aug  2 19:36:22.990719 clma [10058:clma_api.c:1035] TR RC before give handle 
flagsTrack 6
Aug  2 19:36:22.990730 clma [10058:clma_api.c:1038] << clmaclustertrack
Aug  2 19:36:22.990740 clma [10058:clma_api.c:0938] << saClmClusterTrack_4
Aug  2 19:36:26.998572 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4
Aug  2 19:36:26.998625 clma [10058:clma_api.c:0968] >> clmaclustertrack
Aug  2 19:36:26.998636 clma [10058:clma_api.c:0986] TR CLMS down
Aug  2 19:36:26.998642 clma [10058:clma_api.c:1035] TR RC before give handle 
flagsTrack 6
Aug  2 19:36:26.998648 clma [10058:clma_api.c:1038] << clmaclustertrack
Aug  2 19:36:26.998657 clma [10058:clma_api.c:0938] << saClmClusterTrack_4
Aug  2 19:36:30.837965 clma [10058:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe 
evt from svc 34
Aug  2 19:36:30.837983 clma [10058:clma_mds.c:0978] T2 MSG from CLMS 
NCSMDS_NEW_ACTIVE/UP
Aug  2 19:36:30.837989 clma [10058:clma_mds.c:0989] TR ** Marking handle as 
BAD**
Aug  2 19:36:30.839058 clma [10058:sysf_ipc.c:0363] TR IN LEAP_DBG_SINK
Aug  2 19:36:30.839070 clma [10058:clma_util.c:0625] << clma_hdl_cbk_dispatch
Aug  2 19:36:30.839076 clma [10058:clma_api.c:0793] << saClmDispatch
Aug  2 19:36:31.259065 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4
Aug  2 19:36:31.259088 clma [10058:clma_api.c:0968] >> clmaclustertrack
Aug  2 19:36:31.259097 clma [10058:clma_util.c:0036] >> clma_validate_version
Aug  2 19:36:31.259103 clma [10058:clma_util.c:0042] << clma_validate_version
Aug  2 19:36:31.259108 clma [10058:clma_api.c:1009] TR B.4.1 version
Aug  2 19:36:31.259113 clma [10058:clma_api.c:0140] >> 
clma_validate_flags_buf_4: flags=0x15
Aug  2 19:36:31.259118 clma [10058:clma_api.c:0176] << clma_validate_flags_buf_4
Aug  2 19:36:31.259124 clma [10058:clma_api.c:1020] TR RC after validate 
flagsTrack 1
Aug  2 19:36:31.259129 clma [10058:clma_util.c:0036] >> clma_validate_version
Aug  2 19:36:31.259140 clma [10058:clma_util.c:0042] << clma_validate_version
Aug  2 19:36:31.259145 clma [10058:clma_mds.c:1274] >> clma_mds_msg_async_send
Aug  2 19:36:31.259158 clma [10058:clma_mds.c:0317] >> clma_mds_enc
Aug  2 19:36:31.259166 clma [10058:clma_mds.c:0352] T2 msgtype: 0
Aug  2 19:36:31.259171 clma [10058:clma_mds.c:0366] T2 api_info.type: 2
Aug  2 19:36:31.259177 clma [10058:clma_mds.c:0118] >> clma_enc_track_start_msg
Aug  2 19:36:31.259182 clma [10058:clma_mds.c:0134] << clma_enc_track_start_msg
Aug  2 19:36:31.259187 clma [10058:clma_mds.c:0407] << clma_mds_enc
Aug  2 19:36:31.259260 clma [10058:clma_mds.c:1296] << clma_mds_msg_async_send
Aug  2 19:36:31.259272 clma [10058:clma_api.c:0455] << clma_send_md

 If Dispatch api is called once AGAIN after the controller joins , BAD_HANDLE 
is returned.
 
 
 Expected behavior :
 
  If the handle is marked as BAD internally, the apis saClmClusterTrack_4 and 
saClmClusterNodeGet_4 should also return BAD_HANDLE once the controller joins 
back. Currently Dispatch returns BAD_HANDLE


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make info

[tickets] [opensaf:tickets] #1869 AMF: SG in unstable for SI lock operation, after HEADLESS

2016-06-09 Thread Srikanth R



---

** [tickets:#1869] AMF: SG in unstable for SI lock operation, after HEADLESS **

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Thu Jun 09, 2016 07:24 AM UTC by Srikanth R
**Last Updated:** Thu Jun 09, 2016 07:24 AM UTC
**Owner:** nobody


Opensaf version : 5.0. GA.
Setup : 5 nodes with 3 controllers. PL-5, PL-4  hosted active, standby 
assignments for application 2n SU. where as SC-3 hosted spare SU.

Steps performed:

-> Brought down all the controllers and headless scenario is created.
->Now stopped opensafd on PL-5, where SU2 is hosting active assignment 
-> SU1 on PL-4 did not get active assignment. It remained in standby assignment.
-> After the controllers joined back the cluster, following is the error 
message printed on the PL-4. 

Aug 26 17:21:34 SCALE_SLOT-94 osafamfnd[19347]: CR SU-SI record addition 
failed, SU= safSu=SU1,safSg=AmfDemo,safApp=AmfDemo : 
SI=safSi=AmfDemo,safApp=AmfDemo


SCALE_SLOT-94:~ # immlist safSi=AmfDemo,safApp=AmfDemo
Name   Type Value(s)
saAmfSIPrefStandbyAssignments  SA_UINT32_T  1 (0x1)
saAmfSIPrefActiveAssignments   SA_UINT32_T  1 (0x1)
saAmfSINumCurrStandbyAssignments   SA_UINT32_T  2 (0x2)
saAmfSINumCurrActiveAssignmentsSA_UINT32_T  0 (0x0)
saAmfSIAssignmentState SA_UINT32_T  3 (0x3)


-> Lock operation on SI resulted in SG unstable operation. 


46 04:30:46 07/23/2016 NO safApp=safAmfService "Cluster startup 
timeout, assigning SIs to SUs"
47 04:30:46 07/23/2016 NO safApp=safAmfService 
"safSi=AmfDemo,safApp=AmfDemo assigned to 
safSu=SU1,safSg=AmfDemo,safApp=AmfDemo HA State 'STANDBY'"
48 04:30:46 07/23/2016 NO safApp=safAmfService "Autorepair not done for 
'safSu=SC-3,safSg=2N,safApp=OpenSAF'"
49 04:30:46 07/23/2016 NO safApp=safAmfService "Autorepair not done for 
'safSu=SU3_Spare,safSg=AmfDemo,safApp=AmfDemo'"
50 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op "LOCK" 
initiated for 'safSi=AmfDemo,safApp=AmfDemo', invocation: 219043332097"
51 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op invocation: 
219043332097, err: 'SI lock of 'safSi=AmfDemo,safApp=AmfDemo' failed, SG not 
stable'"
52 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op done for 
invocation: 219043332097, result 6"
53 07:49:22 07/23/2016 NO safApp=safAmfService "Admin op "LOCK" 
initiated for 'safSi=AmfDemo,safApp=AmfDemo', invocation: 2190


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1867 HEADLESS : Payloads went for reboot, in headless state as CPSV got TIMEOUT rc for CLM API

2016-06-08 Thread Srikanth R



---

** [tickets:#1867] HEADLESS : Payloads went for reboot, in headless state as 
CPSV got TIMEOUT rc for CLM API**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Wed Jun 08, 2016 10:54 AM UTC by Srikanth R
**Last Updated:** Wed Jun 08, 2016 10:54 AM UTC
**Owner:** nobody


Version : Opensaf 5.0. GA
Setup : Two payloads with three controllers.

 Steps performed :
 
 -> Initially all the nodes are part of the cluster.
 -> Induced failover by bringing down active, standby and spare in the order.
 Aug  7 20:30:08 SCALE_SLOT-94 kernel: [5993776.936794] TIPC: Lost contact with 
<1.1.1>
Aug  7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO Sleep done registering IMMND 
with MDS
Aug  7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO MDS: mds_register_callback: 
dest 2040fa5bb6016 already exist
Aug  7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO SUCCESS IN REGISTERING IMMND 
WITH MDS
Aug  7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me 
highestProcessed:6859 highestReceived:6859
Aug  7 20:30:13 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to 
service:IMMD rc:2
Aug  7 20:30:14 SCALE_SLOT-94 osafamfnd[2767]: WA AMF director unexpectedly 
crashed

 -> On the both payloads, CKPTND restarted with the following error in syslog.
 
 Aug  7 20:30:17 SCALE_SLOT-94 osafckptnd[2787]: ER cpnd clm node get failed 
with return value:5
Aug  7 20:30:17 SCALE_SLOT-94 osafamfnd[2767]: NO 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'componentRestart'
Aug  7 20:30:17 SCALE_SLOT-94 osafckptnd[14434]: Started

-> But CKPTND Instantation failed and finally the node went for reboot.

Aug  7 20:30:27 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me 
highestProcessed:6859 highestReceived:6859
Aug  7 20:30:27 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to 
service:IMMD rc:2
Aug  7 20:30:27 SCALE_SLOT-94 osafamfnd[2767]: NO Instantiation of 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed
Aug  7 20:30:27 SCALE_SLOT-94 osafamfnd[2767]: NO Reason: component 
registration timer expired
Aug  7 20:30:27 SCALE_SLOT-94 osafckptnd[14451]: Started
...

Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Instantiation of 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Reason: component 
registration timer expired
Aug  7 20:30:38 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me 
highestProcessed:6859 highestReceived:6859
Aug  7 20:30:38 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to 
service:IMMD rc:2
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: WA 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State RESTARTING 
=> INSTANTIATION_FAILED
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO avnd_di_oper_send() deferred 
as AMF director is offline
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: WA Director is down. Remove all 
SIs from 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Component Failover trigerred 
for 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Failed component: 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: ER 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got Inst failed
Aug  7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: Rebooting OpenSAF NodeId = 
132111 EE Name = , Reason: NCS component Instantiation failed, OwnNodeId = 
132111, SupervisionTime = 60



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1849 CKPT : Performance degradation upto 200%

2016-05-25 Thread Srikanth R



---

** [tickets:#1849] CKPT : Performance degradation upto 200%**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Wed May 25, 2016 07:29 AM UTC by Srikanth R
**Last Updated:** Wed May 25, 2016 07:29 AM UTC
**Owner:** nobody


Changeset : 7640
Setup : SUSE 11 on Physical machines.


There is considerable degradation in CKPT performance in 5.0 when compared to 
4.7. The times are calculated just before api and after api for which time 
difference is calculated.
  
  -> For write operations, checkpoint write api  is taking more than 3 times 
the earlier value in 4.7. Issue is observed in both synchronous and 
asynchronous mode.
 ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS
 asynchronous -- Checkpoint create flag used :  SA_CKPT_WR_ACTIVE_REPLICA | 
SA_CKPT_CHECKPOINT_COLLOCATED )
 
  -> For section create operations in synchronous mode, checkpoint section 
create api is taking more  than 33% the earlier value in 4.7 
  
  ->  For read operations in synchronous mode, checkpoint read api is taking 
more than 15% the earlier value in 4.7

  Please check the tickets pushed as part of 4.7 to 5.0, for which API 
performance got affected.
  


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1842 rde: standby amfd notifies to NID early.

2016-05-25 Thread Srikanth R
 If opensafd on standby is successfully started, then it means the standby node 
is ready to take the active role.  
 
 Performed failover, after standby joined the cluster successfully. But the 
standby node could not take the active role and entire *CLUSTER RESET* has 
happened, as the cluster is not having active role.
 
 On the active controller ::
 
 May 25 11:18:03 CONTROLLER-1 osafimmnd[2281]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
May 25 11:18:03 CONTROLLER-1 osafamfd[2342]: NO Received node_up from 2020f: 
msg_id 1
May 25 11:18:04 CONTROLLER-1 osafamfd[2342]: NO Node 'SC-2' joined the cluster
9May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer connected: 19 
(MsgQueueService131599) <0, 2020f>
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Peer up on node 0x2020f
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info request from node 
0x2020f with role STANDBY
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info response from 
node 0x2020f with role STANDBY
May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) 
connected: 20 (@safAmfService2020f) <0, 2020f>
May 25 11:18:05 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) 
connected: 21 (@OpenSafImmReplicatorB) <0, 2020f>

May 25 11:18:05 CONTROLLER-1 osafamfnd[2353]: NO 
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'



On the standby controller ::

May 25 11:18:04 CONTROLLER-2 osafrded[4212]: NO Got peer info response from 
node 0x2010f with role ACTIVE
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN AMF HA STANDBY request
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
564114611150864
May 25 11:18:04 CONTROLLER-2 osafamfnd[4292]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
565214191280144
May 25 11:18:04 CONTROLLER-2 opensafd: OpenSAF(5.0.0 - ) services successfully 
started
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
567412731609092
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
566316589850628


  done
CONTROLLER-2:~ # May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer 
(applier) connected: 20 (@safAmfService2020f) <139, 2020f>
May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer (applier) 
connected: 21 (@OpenSafImmReplicatorB) <147, 2020f>
May 25 11:18:04 CONTROLLER-2 osafntfimcnd[4446]: NO Started
May 25 11:18:05 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 12 
<0, 2010f> (safCheckPointService)
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Node Down event for node id 
2010f:
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Current role: STANDBY
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131599, 
SupervisionTime = 60
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200249] TIPC: Resetting link 
<1.1.2:eth3-1.1.1:eth0>, peer not responding
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200263] TIPC: Lost link 
<1.1.2:eth3-1.1.1:eth0> on network plane A
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200272] TIPC: Lost contact with 
<1.1.1>
May 25 11:18:10 CONTROLLER-2 osafrded[4212]: NO Peer down on node 0x2010f
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52769, 
will not mbcp to peer IMMD
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS 
message:52769
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for 
message type 82 - ignoring
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52770, 
will not mbcp to peer IMMD
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS 
message:52770
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for 
message type 82 - ignoring
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs 
message 52769 since it has recently been resent.
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs 
message 52770 since it has recently been resent.
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Global discard node received 
for nodeId:2010f pid:2281
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 13 
<0, 2010f(down)> (OpenSafImmPBE)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 10 
<0, 2010f(down)> (safSmfService)
May 25 11:18:10 CONTROLLER-2 

[tickets] [opensaf:tickets] #1845 IMM: Standby syncing is delayed untill payloads join the cluster

2016-05-23 Thread Srikanth R



---

** [tickets:#1845] IMM: Standby syncing is delayed untill payloads join the 
cluster**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Mon May 23, 2016 10:46 AM UTC by Srikanth R
**Last Updated:** Mon May 23, 2016 10:46 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1845.tgz](https://sourceforge.net/p/opensaf/tickets/1845/attachment/1845.tgz) 
(10.6 MB; application/x-compressed-tar)


Changeset : 7640
Setup : 4 nodes cluster with PBE of 50k objects.SUSE 11.2 VM

Issue : Standby syncing is delayed with active controller, untill payloads join 
the cluster.

Steps :

1. PBE is enabled on the setup and 50k DB is created earlier.
2. Started opensaf on all the nodes simultaneously.

May 23 15:32:49 CONTROLLER-1 osafrded[27818]: NO Requesting ACTIVE role
May 23 15:32:49 CONTROLLER-1 osafrded[27818]: NO RDE role set to Undefined
May 23 15:32:49 CONTROLLER-1 kernel: [10654.360773] TIPC: Established link 
<1.1.1:eth0-1.1.2:eth3> on network plane A
May 23 15:32:49 CONTROLLER-1 kernel: [10654.881929] TIPC: Established link 
<1.1.1:eth0-1.1.3:eth3> on network plane A
May 23 15:32:50 CONTROLLER-1 kernel: [10655.434543] TIPC: Established link 
<1.1.1:eth0-1.1.4:eth3> on network plane A

May 23 15:32:51 CONTROLLER-1 osafimmd[27837]: NO Attached Nodes:3 Accepted 
nodes:0 KnownVeteran:0 doReply:1

3. Opensafd on SC-1 started at May 23 15:33:41.
4. Opensafd on PL-3 and PL-4 started at 15:33:42.
5. On SC-2, imm syncing started at 15:33:47 after the active controller and 
payloads joined.
May 23 15:33:47 CONTROLLER-1 osafimmd[27837]: NO Successfully announced sync. 
New ruling epoch:9
May 23 15:33:47 CONTROLLER-1 osafimmloadd: NO Sync starting
6. At 15:33:57, SC-2 joined the cluster successfully.


Because of this, time taken for standby to join during simultaneous startup 
of all nodes have increased by sync time. 

Value for  IMMSV_NUM_NODES is default (5) and not changed.




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1836 CKPT: Section is not deleted during switchover, after expiration time

2016-05-18 Thread Srikanth R
- **status**: unassigned --> invalid
- **Comment**:

This issue is not observed for sleep of 1.2 seconds 



---

** [tickets:#1836] CKPT: Section is not deleted during switchover, after 
expiration time**

**Status:** invalid
**Milestone:** 4.7.2
**Created:** Wed May 18, 2016 01:20 AM UTC by Srikanth R
**Last Updated:** Wed May 18, 2016 02:36 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1836.tgz](https://sourceforge.net/p/opensaf/tickets/1836/attachment/1836.tgz) 
(224.7 kB; application/x-compressed-tar)


  Some times, section is not deleted in a checkpoint after expiry time, during 
switchover.
  
  Below are the steps performed.
  
  1. Created a checkpoint with ALL_REPLICAS.
  2. Opened the checkpoint for writing.
  3. Section is created.
  4. Expiration time is set to 1 second.
  5. Invoked middleware switchover
  6. After 1.1 second, accessed the checkpoint section by deleting the section. 
The expected return value is ERR_NOT_EXIST, but the section deletion succeded 
with SA_AIS_OK.
  

 With out switchovers, this issue is observed once in 15 times on an idle 
setup.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1836 CKPT: Section is not deleted during switchover, after expiration time

2016-05-17 Thread Srikanth R



---

** [tickets:#1836] CKPT: Section is not deleted during switchover, after 
expiration time**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Wed May 18, 2016 01:20 AM UTC by Srikanth R
**Last Updated:** Wed May 18, 2016 01:20 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1836.tgz](https://sourceforge.net/p/opensaf/tickets/1836/attachment/1836.tgz) 
(224.7 kB; application/x-compressed-tar)


  Some times, section is not deleted in a checkpoint after expiry time, during 
switchover.
  
  Below are the steps performed.
  
  1. Created a checkpoint with ALL_REPLICAS.
  2. Opened the checkpoint for writing.
  3. Section is created.
  4. Expiration time is set to 1 second.
  5. Invoked middleware switchover
  6. After 1.1 second, accessed the checkpoint section by deleting the section. 
The expected return value is ERR_NOT_EXIST, but the section deletion succeded 
with SA_AIS_OK.
  

 With out switchovers, this issue is observed once in 15 times on an idle 
setup.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1801 lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers.

2016-05-17 Thread Srikanth R
 Similarly, saLckResourceOpen returns SA_AIS_ERR_LIBRARY after switchovers / 
failovers. This issue is randomly observed.


---

** [tickets:#1801] lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE 
returning SA_AIS_ERR_TIMEOUT after 5 failovers.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon May 02, 2016 09:52 AM UTC by Madhurika Koppula
**Last Updated:** Wed May 04, 2016 06:53 PM UTC
**Owner:** nobody
**Attachments:**

- 
[glsv.tgz](https://sourceforge.net/p/opensaf/tickets/1801/attachment/glsv.tgz) 
(3.0 MB; application/octet-stream)


Setup:
Changeset- 7436
OS: Oracle Linux Server release 6.4 (x86_64)
4 nodes configured with single PBE

some failover tests are being ran.
safLock=resource1_101 object is not getting deleted. Thereby saLckResourceOpen 
with flag SA_LCK_RESOURCE_CREATE is continuously returning SA_AIS_ERR_TIMEOUT.

With sleep of 10secs, 15times retry is done on the same API call.

Snippet from the run:

100|7| SUCCESS : saLckInitialize with valid parameters
100|7| Return Value: SA_AIS_OK
100|7| LckHandle   : 6599312
100|7|
100|7|
100|7| SUCCESS : saLckInitialize with valid parameters
100|7| Return Value: SA_AIS_OK
100|7| LckHandle   : 6599392
100|7|
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7| FAILED  : saLckResourceOpen with valid parameters
100|7| Return Value: SA_AIS_ERR_TIMEOUT

100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE

100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE
100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE

100|7|
100|7| Resource Name   : safLock=resource1_101
100|7| open flags  : SA_LCK_RESOURCE_CREATE Timeout count exceeded: 15

Timestamp of the Active controller at this instant:

May  2 14:22:56 OEL_M-SLOT-2 root: killing osafimmd from run_failover.sh
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May  2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
May  2 14:22:56 OEL_M-SLOT-2 opensaf_reboot: Rebooting local node; timeout=60

Timestamp of the Standby controller which is becoming active after failover:

May  2 14:23:00 OEL_M-SLOT-1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
May  2 14:23:00 OEL_M-SLOT-1 osaffmd[1677]: NO Controller Failover: Setting 
role to ACTIVE
May  2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO RDE role set to ACTIVE
May  2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
May  2 14:23:00 OEL_M-SLOT-1 osafimmd[1688]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osaflogd[1711]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafntfd[1722]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafclmd[1733]: NO ACTIVE request
May  2 14:23:00 OEL_M-SLOT-1 osafamfd[1744]: NO FAILOVER StandBy --> Active

/var/log/messages and osaflckd traces of both controllers  are attached.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1826 AMF: In Nway, pref Active assignments violated for SI after SU lock op

2016-05-15 Thread Srikanth R



---

** [tickets:#1826] AMF: In Nway, pref Active assignments violated for  SI after 
SU lock op**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon May 16, 2016 02:21 AM UTC by Srikanth R
**Last Updated:** Mon May 16, 2016 02:21 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1821.tgz](https://sourceforge.net/p/opensaf/tickets/1826/attachment/1821.tgz) 
(562.9 kB; application/x-compressed-tar)


Setup :
Changeset : 7613 
AMF application : NWay, 4Sus and 4Sis with maxActiveSisPerSu =2 and 
maxStandbySisPerSu=2

Issue : After a lock operation of SU, saAmfSIPrefActiveAssignments is violated 
for SI. SI got two active assignments.


Steps performed :

1. Initially brought up the Nway application, for which all SIs are assigned.


   |  TestApp_SI1   |  TestApp_SI2   |  TestApp_SI3   |  TestApp_SI4   

TestApp_SU1|STANDBY ||  
   |ACTIVE 
TestApp_SU2|||
ACTIVE |STANDBY 
TestApp_SU3||ACTIVE|STANDBY 
  |
TestApp_SU4|ACTIVE|STANDBY |
  |


2. Now performed lock operation of SU, for which SI3 and Si4 got two active 
assignments.


   |  TestApp_SI1   |  TestApp_SI2   |  TestApp_SI3   |  TestApp_SI4   

TestApp_SU1|ACTIVE|STANDBY  |STANDBY
 |ACTIVE 
TestApp_SU2|STANDBY ||ACTIVE
 |ACTIVE 
TestApp_SU3||ACTIVE |ACTIVE 
|STANDBY 
TestApp_SU4|| | 
|



As the prefered active assignments is set to 1, SI3 cannot be assigned more 
than 1 active assignment.

immlist safSi=TestApp_SI3,safApp=TestApp_Nway
saAmfSIPrefStandbyAssignments  SA_UINT32_T  1 (0x1)
saAmfSIPrefActiveAssignments   SA_UINT32_T  1 (0x1)
saAmfSINumCurrStandbyAssignments   SA_UINT32_T  1 (0x1)
saAmfSINumCurrActiveAssignmentsSA_UINT32_T  2 (0x2)
saAmfSIAssignmentState SA_UINT32_T  3 (0x3)

 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster

2016-05-05 Thread Srikanth R
- **summary**: Standby controller failed to join the cluster probably because 
of setup issues --> Standby controller failed to join the cluster
- **status**: unassigned --> duplicate
- **Milestone**: 4.7.2 --> never
- **Comment**:

Closing this ticket as duplicate of #1701. In both the tickets, the CLM client 
(.i.e.. osafamfnd ) received SA_AIS_ERR_NO_MEMORY during initialization. 



---

** [tickets:#1757] Standby controller failed to join the cluster**

**Status:** duplicate
**Milestone:** never
**Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj
**Last Updated:** Wed May 04, 2016 06:56 PM UTC
**Owner:** nobody


*Setup:
Changeset- 7436
Version - opensaf 5.0FC
OS: SUSE 11SP2 x86_64

*Issue observed :
Standby controller failed to join the cluster with error message "ER Failed to 
Initialize with CLM"

*Steps To Reproduce:
> OpenSAF is already up and running on controller1(SC-1)
> when OpenSAF started on controller2(SC-2), it failed with following mesage: 

SCALE_SLOT-2:~ # /etc/init.d/opensafd start
Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: 
[1930938.251473] TIPC: Activated (version 2.0.0)
...
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started
**Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with 
CLM: 8
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed**
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting

> The crossponding syslog of active controller(SC-1) at that time
Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** 
ncs_patricia_tree_add, client_id** 53
Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster


>> It is also observed that, on active controller(SC-1) there in no log record 
>> of osafclmd during which controller2(SC-2) failed, while other service have 
>> log record at that time stamp

Below is the output of osafclmd (SC-1), during time stamp "Apr 26 
20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything.
Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt
**Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx
Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk**
Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec


Note: 
1. This is random issue
2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1800 AMF : Proxied should be brought down initially during NG lock-in admin op

2016-04-30 Thread Srikanth R



---

** [tickets:#1800] AMF : Proxied should be brought down initially during NG 
lock-in admin op**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Sat Apr 30, 2016 07:50 AM UTC by Srikanth R
**Last Updated:** Sat Apr 30, 2016 07:50 AM UTC
**Owner:** nobody


Changeset : 7436 
Setup : 2n Red model with proxy and proxied SU hosted on same node.


During lock-in operation of node group, initially proxied SU should be brought 
down ( .i.e, component termination callback should be sent for proxied ) and 
later proxy SU should be brought down. 

 But in the current implementation, proxy SU is brought down initially and 
later proxied SU is tried to be brought down , which got failed as there is no 
proxy.
 
 
436 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op 
"LOCK_INSTANTIATION" initiated for 
'safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster', invocation: 300647710721"
   437 05:30:00 01/01/1970 NO safApp=safAmfService 
"safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster AdmState LOCKED => 
LOCKED_INSTANTIATION"
   438 05:30:00 01/01/1970 NO safApp=safAmfService 
"safAmfNode=SC-1,safAmfCluster=myAmfCluster AdmState LOCKED => 
LOCKED_INSTANTIATION"
   439 05:30:00 01/01/1970 NO safApp=safAmfService 
"safAmfNode=SC-2,safAmfCluster=myAmfCluster AdmState LOCKED => 
LOCKED_INSTANTIATION"
   440 05:30:00 01/01/1970 NO safApp=safAmfService 
"safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N 
ProxyStatus is now UNPROXIED"
   441 05:30:00 01/01/1970 NO safApp=safAmfService 
"safSu=PROXY_SU1_2N,safSg=PROXY_SG_2N,safApp=PROXY_2N PresenceState TERMINATING 
=> UNINSTANTIATED"
   442 05:30:00 01/01/1970 NO safApp=safAmfService 
"safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N PresenceState 
TERMINATING => UNINSTANTIATED"
   443 05:30:00 01/01/1970 NO safApp=safAmfService 
"safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N OperState ENABLED 
=> DISABLED"
   444 05:30:00 01/01/1970 NO safApp=safAmfService "Autorepair not done for 
'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'"
   445 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op done for 
invocation: 300647710721, result 1"




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1799 AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy)

2016-04-30 Thread Srikanth R
- **summary**: AMF : csiName and csiFlags are not properly populated, during 
assignment removal --> AMF : csiName and csiFlags are not properly populated, 
during assignment removal ( proxy)
- Description has changed:

Diff:



--- old
+++ new
@@ -3,6 +3,6 @@
 
 
 * Initially the proxy and proxied are in  fully assigned  state.
-* Now perform lock operation on proxy SU, for which quiesced callback and csi 
removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL  and 
csiName is populated as NULL. But the proxy component is having active 
assignments , both for proxy and proxied.  Similar is for lock operation is on 
proxied SU.
+* Now perform lock operation on proxy SU, for which quiesced callback and csi 
removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL  and 
csiName is populated as NULL. But the proxy component is having active 
assignments , which is to be removed according to the callback .  Similar is 
for lock operation is on proxied SU.
 
  So expectation is that for lock operation on either proxy / proxied SU 
,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE  with the corresponding 
csi.






---

** [tickets:#1799] AMF : csiName and csiFlags are not properly populated, 
during assignment removal ( proxy)**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Sat Apr 30, 2016 06:17 AM UTC by Srikanth R
**Last Updated:** Sat Apr 30, 2016 06:17 AM UTC
**Owner:** nobody


Changeset : 7436
Setup :2N redmodel with both proxy and proxied hosted on the same node.


* Initially the proxy and proxied are in  fully assigned  state.
* Now perform lock operation on proxy SU, for which quiesced callback and csi 
removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL  and 
csiName is populated as NULL. But the proxy component is having active 
assignments , which is to be removed according to the callback .  Similar is 
for lock operation is on proxied SU.

 So expectation is that for lock operation on either proxy / proxied SU 
,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE  with the corresponding 
csi.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1799 AMF : csiName and csiFlags are not properly populated, during assignment removal

2016-04-30 Thread Srikanth R



---

** [tickets:#1799] AMF : csiName and csiFlags are not properly populated, 
during assignment removal**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Sat Apr 30, 2016 06:17 AM UTC by Srikanth R
**Last Updated:** Sat Apr 30, 2016 06:17 AM UTC
**Owner:** nobody


Changeset : 7436
Setup :2N redmodel with both proxy and proxied hosted on the same node.


* Initially the proxy and proxied are in  fully assigned  state.
* Now perform lock operation on proxy SU, for which quiesced callback and csi 
removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL  and 
csiName is populated as NULL. But the proxy component is having active 
assignments , both for proxy and proxied.  Similar is for lock operation is on 
proxied SU.

 So expectation is that for lock operation on either proxy / proxied SU 
,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE  with the corresponding 
csi.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1792 osaf: update opensaf status command to reflect spares

2016-04-29 Thread Srikanth R
It would be easy to comprehend , if rdegetrole command on spare controllers  
give the output as "SPARE" instead of QUIESCED.


---

** [tickets:#1792] osaf: update opensaf status command to reflect spares**

**Status:** unassigned
**Milestone:** 5.0.GA
**Created:** Thu Apr 28, 2016 08:03 PM UTC by Mathi Naickan
**Last Updated:** Thu Apr 28, 2016 08:03 PM UTC
**Owner:** nobody





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1780 Imm: Imm service is down FOREVER on the nodes after IMMND restart , due to system issues

2016-04-29 Thread Srikanth R
- **summary**: Imm: Immfind failed with TRY_AGAIN after immnd is killed on 
payload PL-3 and on active controller --> Imm: Imm service is down FOREVER on 
the nodes after IMMND restart , due to system issues
- **Comment**:

Changing the ticket heading for better understanding



---

** [tickets:#1780] Imm: Imm service is down FOREVER on the nodes after IMMND 
restart , due to system issues**

**Status:** invalid
**Milestone:** 5.0.RC2
**Created:** Mon Apr 25, 2016 11:46 AM UTC by Madhurika Koppula
**Last Updated:** Tue Apr 26, 2016 06:54 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- [imm.tgz](https://sourceforge.net/p/opensaf/tickets/1780/attachment/imm.tgz) 
(13.2 kB; application/octet-stream)


Setup:
Changeset- 7436
OS: Oracle Linux Server release 6.4 (x86_64)
Version - opensaf 5.0
4 nodes configured with single PBE

Immfind is failed with TRY_AGAINS after immnd is killed on PL-3 and on active 
controller.
Imm admin operations are still failing forever on PL-3 and SC-1 (Active) even 
though immnd got restarted properly on PL-3 and SC-1.
(Initialize itself is failing ).

Steps To reproduce:

1) Kill Immnd on Active and PL-3
2)Perform any imm admin operations.

Here is the snippet.
[root@OEL_M-SLOT-3 log]# immfind
error - saImmOmInitialize FAILED: SA_AIS_ERR_TRY_AGAIN (6)
[root@OEL_M-SLOT-3 log]#

1st killed IMMND on ACTIVE controller at below timestamp:

Apr 25 11:48:52 OEL_M-SLOT-1 osafntfimcnd[9124]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: NO New coord elected, resides at 
2020f
Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: Started
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE 
Controller at 2010f

2nd killed IMMND on ACTIVE controller at below timestamp:

Apr 25 14:44:52 OEL_M-SLOT-1 osafamfnd[1796]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: Started
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE 
Controller at 2010f
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO Extended intro from node 2010f
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SETTING COORD TO 0 CLOUD PROTO
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND on controller (not 
currently coord) requests sync
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO NODE STATE-> IMM_NODE_ISOLATED


Killed IMMND on PL-3 at below time stamp:

Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO Restarting a component of 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'

Attaching the log snippets of Immnd active and immnd PL-3 and /var/log/messages.

This issue might be related to the ticket #1735, because node state of immnd of 
PL-3 is also observed as IMM_NODE_ISOLATED.  But immfind did not suceed for 
ever on SC-1 Active even though immnd restarted successfully on SC-1 at below 
timestamp

Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> IMM_NODE_ISOLATED
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO Node 2010f request sync 
sync-pid:10126 epoch:0
Apr 25 11:48:54 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
Apr 25 11:48:54 OEL_M-SLOT-1 osafimmd[1716]: NO Successfully 

[tickets] [opensaf:tickets] #1546 AMF : Lock of node should be allowed similar to ng, if more than one SU is hosted

2016-04-29 Thread Srikanth R
- **Type**: enhancement --> defect
- **Comment**:

Changing the ticket type to defect, as the SU struck in terminating state for 
the following steps.

* Deploy two SUs on a single payload, for which active and standby assignments 
are done (2N).
* Perform  Lock operation on the payload's CLM object. The lock operation fails 
with the ERR_REPAIR_PENDING return code.
* Now perform lock operation on the application SG, followed by lock-in 
operation for which SU gets struck in terminating state.



---

** [tickets:#1546] AMF : Lock of node should be allowed similar to ng, if more 
than one SU is hosted**

**Status:** unassigned
**Milestone:** future
**Created:** Thu Oct 15, 2015 03:38 AM UTC by Srikanth R
**Last Updated:** Wed Jan 13, 2016 06:29 AM UTC
**Owner:** nobody


Changeset : 6901


  currently for 2N, lock of node group is allowed, if more than 1 SU is hosted 
on the member node of node group. But lock of node is not allowed, if more than 
1 SU is hosted.  
  
  
  # amf-adm lock safAmfNode=SC-2,safAmfCluster=myAmfCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_NOT_SUPPORTED (19)
error-string: Node lock/shutdown not allowed with two SUs on same node


#amf-adm lock safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster
   safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safAmfNode=SC-1,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safAmfNode=SC-2,safAmfCluster=myAmfCluster
  UNLOCKED --> LOCKED
   safSi=TWONSI5,safApp=TWONAPP
  FULLYASSIGNED --> PARTIALLYASSIGNED
   safSi=TWONSI5,safApp=TWONAPP
  Alarm MAJOR



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1795 AMF : haState should be marked QUIESCING in PG callback for shutdown op

2016-04-29 Thread Srikanth R



---

** [tickets:#1795] AMF : haState should be marked QUIESCING in PG callback for 
shutdown op**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Fri Apr 29, 2016 07:24 AM UTC by Srikanth R
**Last Updated:** Fri Apr 29, 2016 07:24 AM UTC
**Owner:** nobody


Changeset : 7434

For the shutdown operation on the SI, the haState is filled up with the value 
SA_AMF_HA_QUIESCED (3), instead of SA_AMF_HA_QUIESCING (4)  in the protection 
group callback.


PROTECTION GROUP CALLBACK IS INVOKED
error :  1
numberOfMembers :  2
csiName :  safCsi=CSI1,safSi=TestApp_SI1,safApp=TestApp_TwoN
number of items in notification buffer is  2
{0: {'member': {'haState': 2, 'compName': 
safComp=COMP1,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 
1, 'haReadinessState': 1}, 'change': 1}, 1: {'member': {'haState': **3**, 
'compName': 
safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 
2, 'haReadinessState': 1}, 'change': 4}}



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1794 AMF : amfd crashed on both controllers, after opensafd is stopped on appl hosted payloads

2016-04-29 Thread Srikanth R



---

** [tickets:#1794] AMF : amfd crashed on both controllers, after opensafd is 
stopped on appl hosted  payloads **

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Fri Apr 29, 2016 06:48 AM UTC by Srikanth R
**Last Updated:** Fri Apr 29, 2016 06:48 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1794.tgz](https://sourceforge.net/p/opensaf/tickets/1794/attachment/1794.tgz) 
(3.8 MB; application/x-compressed-tar)


Changeset : 7436 5.0.FC
Setup : 5 nodes cluster with 3 payloads.
Application : 2n red model , 3 SUs with 4 SIs ( si-si dep configured )
PL-3 is hosting SU1 and SU3 and PL-4 is hosting SU2.

Issue : AMFD on both controllers crashed , after opensafd is stopped on  
application hosted payloads.

Steps performed :

-> After deploying application, lot of AMF related operations have been 
performed.

-> After that,  following is the opensafd status , where SU1 deployed on PL-3 
is standby and SU2 deployed on PL-4 is active.

safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed6,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed5,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN
saAmfSISUHAState=ACTIVE(1)


-> Now stopped opensafd on the payloads PL-5 and PL-4, one after another.

-> Amfd on the active controller crashed after opensafd is stopped on PL-4.

Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: NO Node 'PL-4' left the cluster
Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: sg_2n_fsm.cc:534: 
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
Apr 28 16:47:54 CONTROLLER-2 osafamfnd[12198]: WA AMF director unexpectedly 
crashed

Note, this issue is not reproducible just by bringing up the application and 
performing the above steps.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1784 Amfd asserts on clm locked controller after successfully taking active role as a part of failover

2016-04-28 Thread Srikanth R
Amfd asserts for invalid root cause entity here. CLM populating the invalid 
root cause entity as part of the callback is reported in the ticket #1342  


---

** [tickets:#1784] Amfd asserts on clm locked controller after successfully 
taking active role as a part of  failover**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Tue Apr 26, 2016 11:47 AM UTC by Ritu Raj
**Last Updated:** Tue Apr 26, 2016 11:47 AM UTC
**Owner:** nobody
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/1784/attachment/messages) 
(3.2 MB; application/octet-stream)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/1784/attachment/osafamfd) 
(7.4 MB; application/octet-stream)


setup:
Changeset- 7436
Version - opensaf 5.0 FC

 * Issue Observed :
Amfd asserts on clm locked controller after successfully taking active role as 
a part of  failover.  


* Steps To Reproduce:
1. OpenSAF running on 4 nodes, where SC-1 is Active , SC-2 Standby and PL-3 and 
PL-4 are payloads.
2. Performed CLM lock of stanby controller (SC-2),
3. Now, perform failover on active controller(SC-1)
4. Observed that amfd asserted on clm locked controller(SC-2) and cluster reset 
happened

>SLOT-2:~ # Apr 26 14:56:06 SLOT-2 osafimmd[2199]: WA IMMD lost contact with 
>peer IMMD (NCSMDS_RED_DOWN)
...
Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Node Down event for node id 2010f:
Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Current role: STANDBY
...
Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Peer down on node 0x2010f
Apr 26 14:56:11 SLOT-2 osafimmd[2199]: WA IMMND DOWN on active controller 1 
detected at standby immd!! 2. Possible failover
...
Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Controller Failover: Setting role to 
ACTIVE
Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO RDE role set to ACTIVE
Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ACTIVE request
Apr 26 14:56:11 SLOT-2 osaflogd[2224]: NO ACTIVE request
Apr 26 14:56:11 SLOT-2 osafntfd[2234]: NO ACTIVE request
Apr 26 14:56:11 SLOT-2 osafclmd[2244]: NO ACTIVE request
Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active
Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO AVD NEW_ACTIVE, adest:1
Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ellect_coord invoke from rda_callback 
ACTIVE
Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO New coord elected, resides at 2020f
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO 2PBE configured, 
IMMSV_PBE_FILE_SUFFIX:.2020f (sync)
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO This IMMND is now the NEW Coord
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO SETTING COORD TO 1 CLOUD PROTO
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer disconnected 16 <139, 
2020f> (@safAmfService2020f)
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 18 
(safLogService) <126, 2020f>
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 19 
(safAmfService) <139, 2020f>
Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO Node 'SC-1' left the cluster
Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active DONE!
Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Apr 26 14:56:11 SLOT-2 osafntfimcnd[2419]: NO exiting on signal 15
..
Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 27 
(safSmfService) <337, 2020f>
Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Apr 26 14:56:11 SLOT-2 osafamfd[2254]: ER Wrong rootCauseEntity �H�
Apr 26 14:56:11 SLOT-2 osafamfd[2254]: clm.cc:312: clm_track_cb: Assertion '0' 
failed.
Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: WA AMF director unexpectedly crashed
Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting local node; timeout=60


* Syslog and amfd trace attached
 Note: The issue is observed randomly


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!

[tickets] [opensaf:tickets] #1779 AMF: si-swap locked SI returns "error - command timed out"

2016-04-28 Thread Srikanth R
There is already existing ticket for this issue : 
https://sourceforge.net/p/opensaf/tickets/1294/


---

** [tickets:#1779] AMF: si-swap locked SI returns "error - command timed out"**

**Status:** unassigned
**Milestone:** 4.6.1
**Created:** Mon Apr 25, 2016 09:19 AM UTC by Quyen Dao
**Last Updated:** Mon Apr 25, 2016 10:16 AM UTC
**Owner:** nobody


cs: 7537:06ac24c4b9c3

**steps to reproduce**
immcfg -f AppConfig-2N.xml
amf-adm unlock-in safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
amf-adm unlock-in safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
amf-adm unlock safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
amf-adm unlock safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
amf-adm lock safSi=AmfDemo,safApp=AmfDemo1
amf-adm si-swap safSi=AmfDemo,safApp=AmfDemo1

**Result**
root@SC-1:/srv/shared# immcfg -f AppConfig-2N.xml
root@SC-1:/srv/shared# amf-adm unlock-in safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
root@SC-1:/srv/shared# amf-adm unlock-in safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
root@SC-1:/srv/shared# amf-adm unlock safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
root@SC-1:/srv/shared# amf-adm unlock safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
root@SC-1:/srv/shared# amf-adm lock safSi=AmfDemo,safApp=AmfDemo1
root@SC-1:/srv/shared# amf-adm si-swap safSi=AmfDemo,safApp=AmfDemo1
error - command timed out (alarm)
root@SC-1:/srv/shared#



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover

2016-04-22 Thread Srikanth R
Below are the definitive steps to reproduce the issue.

* Bringup four nodes in the cluster with SC-1 as active controller
* Issue two switchovers.
* Perform lock operation on one of payload say PL-4
* Perform unlock operation on locked payload
* Restart opensafd on the payload
* Perform failover by killing any director process on the active controller 
(SC-1)
* The standby takes the active role and as part of transition updates the 
unlocked PL-4 as out of cluster.

Change set version : 7436 ( 5.0.FC)


---

** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after 
failover**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R
**Last Updated:** Thu Apr 14, 2016 11:53 AM UTC
**Owner:** nobody


Setup :
Changeset : 7436 5.0.FC
5 nodes cluster with Application deployed on PL-3 and PL-4.

Issue :
Healthy payloads are marked as Non-member nodes after failover

Steps performed :

 * Started opensaf on all the nodes .i.e SC-1 to PL-5
 * Initially brought up AMF application deployed on PL-3 and PL-4
 * Ran some tests on the setup including switchovers, failovers and  CLM lock 
operations on PL-3 and PL-4.
 * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got 
the corresponding standby assignment as per expectation.
  Below is the trace from osafclmd
 Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster Join
 
 
 * Similarly restarted opensafd on PL-3 and the AMF application came up fine.
 Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster exit
 Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster Join
 
 
 * Now induced a failover by killing ckptd on the active controller SC-1.
 
 * SC-2 took active role.
 Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active
 
 * But the two payloads PL-3 and PL-4 are marked as out  of cluster by AMF.  
PL-5 is still part of the cluster

Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid 
node ID (2030f)

 * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active 
promotion.
 Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-3,safCluster=myClmCluster exit
 
 * The AMF applications on PL-3 and PL-4 did not receive any csi removal 
callback during failover, but AMF nodes are marked as disabled &  attribute 
saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0.  Opensafd  
status doesn't show PL-3 and PL-4, 
 
 * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other 
services like CKPT, MQSV.
 
 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-04-20 Thread Srikanth R
Irrespective of the callbacks order, this issue should not be observed.


---

** [tickets:#1765] saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 4.6.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 20, 2016 06:03 AM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: 
retval = 1
Apr 15  8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with 
return value:2,ckptHandle:63
Apr 15  8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: 
API return code = 2**

> Traces of both controllers and agent trace of payload is attached.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1770 AMF : amfnd segfaulted during su failover escalation

2016-04-19 Thread Srikanth R
Traces of amfnd and syslog on the node hosting SU. Also Application 
configuration is attached.


Attachments:

- 
[1770.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/fa42a0c9/d65e/attachment/1770.tgz)
 (393.8 kB; application/x-compressed-tar)


---

** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R
**Last Updated:** Tue Apr 19, 2016 06:53 AM UTC
**Owner:** nobody


Setup :
5 node cluster with 3 payloads
changeset : 7438 ( opensaf 5.0.FC)
Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled)

Issue :

 AMFND hosting the faulty SU segfaulted during su Failover escalation as part 
of SU lock operation
 
 Steps performed :
 
 -> Initially bring up the application and ensure that application is fully 
assigned.
 
 -> Perform one fault operation on the SU hosting the active assignment, such a 
way that the next fault is escalated to su failover.
 
 -> Perform lock operation of SU hosting the active assignment.
 
 -> Do not respond to the CSI removal callback, for which this fault shall be 
escalated to su failover.
 
 -> AMFND seg faulted with the following bt file
 
 signal: 11 pid: 320 uid: 0
/usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d]
/lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0]
/usr/lib64/opensaf/osafamfnd[0x43b1ff]
/usr/lib64/opensaf/osafamfnd[0x417f89]
/usr/lib64/opensaf/osafamfnd[0x408469]
/usr/lib64/opensaf/osafamfnd[0x42c65a]
/usr/lib64/opensaf/osafamfnd[0x42c4a0]
/usr/lib64/opensaf/osafamfnd[0x42b979]
/lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36]
/usr/lib64/opensaf/osafamfnd[0x405f29]

-> Below is the entry in osafamfnd trace :

Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 
'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter 
presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
state:SA_AMF_PRESENCE_TERMINATING(4)
Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: 
retval=1
Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc
Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1
Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << 
avnd_comp_csi_remove_done: 1
Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1
Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> 
all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << 
all_csis_in_removed_state: 1
Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> 
susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << 
susi_operation_in_progress: 1
Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> 
is_no_assignment_due_to_escalations
Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << 
is_no_assignment_due_to_escalations: true
Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR
NCS:PROCESS_ID=2500


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1770 AMF : amfnd segfaulted during su failover escalation

2016-04-19 Thread Srikanth R



---

** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R
**Last Updated:** Tue Apr 19, 2016 06:53 AM UTC
**Owner:** nobody


Setup :
5 node cluster with 3 payloads
changeset : 7438 ( opensaf 5.0.FC)
Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled)

Issue :

 AMFND hosting the faulty SU segfaulted during su Failover escalation as part 
of SU lock operation
 
 Steps performed :
 
 -> Initially bring up the application and ensure that application is fully 
assigned.
 
 -> Perform one fault operation on the SU hosting the active assignment, such a 
way that the next fault is escalated to su failover.
 
 -> Perform lock operation of SU hosting the active assignment.
 
 -> Do not respond to the CSI removal callback, for which this fault shall be 
escalated to su failover.
 
 -> AMFND seg faulted with the following bt file
 
 signal: 11 pid: 320 uid: 0
/usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d]
/lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0]
/usr/lib64/opensaf/osafamfnd[0x43b1ff]
/usr/lib64/opensaf/osafamfnd[0x417f89]
/usr/lib64/opensaf/osafamfnd[0x408469]
/usr/lib64/opensaf/osafamfnd[0x42c65a]
/usr/lib64/opensaf/osafamfnd[0x42c4a0]
/usr/lib64/opensaf/osafamfnd[0x42b979]
/lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36]
/usr/lib64/opensaf/osafamfnd[0x405f29]

-> Below is the entry in osafamfnd trace :

Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 
'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter 
presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
state:SA_AMF_PRESENCE_TERMINATING(4)
Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: 
retval=1
Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc
Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1
Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << 
avnd_comp_csi_remove_done: 1
Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1
Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> 
all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << 
all_csis_in_removed_state: 1
Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> 
susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << 
susi_operation_in_progress: 1
Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> 
is_no_assignment_due_to_escalations
Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << 
is_no_assignment_due_to_escalations: true
Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR
NCS:PROCESS_ID=2500


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster probably because of setup issues

2016-04-18 Thread Srikanth R
Please note that, there is a  time difference between both the controllers of 
close to 94 seconds.


Attachments:

- 
[1757.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/bf13/00e0/attachment/1757.tgz)
 (2.5 MB; application/x-compressed-tar)


---

** [tickets:#1757] Standby controller failed to join the cluster probably 
because of setup issues**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj
**Last Updated:** Mon Apr 18, 2016 09:09 AM UTC
**Owner:** nobody


*Setup:
Changeset- 7436
Version - opensaf 5.0FC
OS: SUSE 11SP2 x86_64

*Issue observed :
Standby controller failed to join the cluster with error message "ER Failed to 
Initialize with CLM"

*Steps To Reproduce:
> OpenSAF is already up and running on controller1(SC-1)
> when OpenSAF started on controller2(SC-2), it failed with following mesage: 

SCALE_SLOT-2:~ # /etc/init.d/opensafd start
Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: 
[1930938.251473] TIPC: Activated (version 2.0.0)
...
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started
**Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with 
CLM: 8
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed**
Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting

> The crossponding syslog of active controller(SC-1) at that time
Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** 
ncs_patricia_tree_add, client_id** 53
Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster


>> It is also observed that, on active controller(SC-1) there in no log record 
>> of osafclmd during which controller2(SC-2) failed, while other service have 
>> log record at that time stamp

Below is the output of osafclmd (SC-1), during time stamp "Apr 26 
20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything.
Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt
**Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx
Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk**
Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec


Note: 
1. This is random issue
2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover

2016-04-14 Thread Srikanth R
Traces of clmd,amfd and immnd on both controllers,with syslog of all nodes are 
attached


Attachments:

- 
[1762.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/e65ee23f/1931/attachment/1762.tgz)
 (3.9 MB; application/x-compressed-tar)


---

** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after 
failover**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R
**Last Updated:** Thu Apr 14, 2016 11:08 AM UTC
**Owner:** nobody


Setup :
Changeset : 7436 5.0.FC
5 nodes cluster with Application deployed on PL-3 and PL-4.

Issue :
Healthy payloads are marked as Non-member nodes after failover

Steps performed :

 * Started opensaf on all the nodes .i.e SC-1 to PL-5
 * Initially brought up AMF application deployed on PL-3 and PL-4
 * Ran some tests on the setup including switchovers, failovers and  CLM lock 
operations on PL-3 and PL-4.
 * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got 
the corresponding standby assignment as per expectation.
  Below is the trace from osafclmd
 Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster Join
 
 
 * Similarly restarted opensafd on PL-3 and the AMF application came up fine.
 Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster exit
 Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster Join
 
 
 * Now induced a failover by killing ckptd on the active controller SC-1.
 
 * SC-2 took active role.
 Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active
 
 * But the two payloads PL-3 and PL-4 are marked as out  of cluster by AMF.  
PL-5 is still part of the cluster

Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid 
node ID (2030f)

 * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active 
promotion.
 Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-3,safCluster=myClmCluster exit
 
 * The AMF applications on PL-3 and PL-4 did not receive any csi removal 
callback during failover, but AMF nodes are marked as disabled &  attribute 
saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0.  Opensafd  
status doesn't show PL-3 and PL-4, 
 
 * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other 
services like CKPT, MQSV.
 
 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover

2016-04-14 Thread Srikanth R



---

** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after 
failover**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R
**Last Updated:** Thu Apr 14, 2016 11:08 AM UTC
**Owner:** nobody


Setup :
Changeset : 7436 5.0.FC
5 nodes cluster with Application deployed on PL-3 and PL-4.

Issue :
Healthy payloads are marked as Non-member nodes after failover

Steps performed :

 * Started opensaf on all the nodes .i.e SC-1 to PL-5
 * Initially brought up AMF application deployed on PL-3 and PL-4
 * Ran some tests on the setup including switchovers, failovers and  CLM lock 
operations on PL-3 and PL-4.
 * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got 
the corresponding standby assignment as per expectation.
  Below is the trace from osafclmd
 Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-4,safCluster=myClmCluster Join
 
 
 * Similarly restarted opensafd on PL-3 and the AMF application came up fine.
 Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster exit
 Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM 
node safNode=PL-3,safCluster=myClmCluster Join
 
 
 * Now induced a failover by killing ckptd on the active controller SC-1.
 
 * SC-2 took active role.
 Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active
 
 * But the two payloads PL-3 and PL-4 are marked as out  of cluster by AMF.  
PL-5 is still part of the cluster

Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid 
node ID (2030f)

 * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active 
promotion.
 Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-4,safCluster=myClmCluster exit
 Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for 
CLM node safNode=PL-3,safCluster=myClmCluster exit
 
 * The AMF applications on PL-3 and PL-4 did not receive any csi removal 
callback during failover, but AMF nodes are marked as disabled &  attribute 
saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0.  Opensafd  
status doesn't show PL-3 and PL-4, 
 
 * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other 
services like CKPT, MQSV.
 
 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1275 AMF: SG is in unstable state ( standby csi removal timeout during sponsor si lock )

2016-04-14 Thread Srikanth R
The first scenario is reproduced with the changeset 7236.
App config : 2n red, 2 SUs with 4 COMPs , 1 sponsor with 3 dependent SIs, su 
restart failover flag disabled.
Please find the amfd trace and application creation script.
Below are the steps followed.

* Ensure that all the SIs are assigned 
* Lock the sponsor SI
* During the lock of sponsor SI operation, the component (COMP1 in SU2 ) 
hosting standby assignment doesn't respond to the CSI removal callback. 



Attachments:

- 
[1275_issue1.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/ae1313a5/5dea/attachment/1275_issue1.tgz)
 (41.8 kB; application/x-compressed-tar)


---

** [tickets:#1275] AMF: SG is in unstable state ( standby csi removal timeout 
during sponsor si lock )**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Thu Mar 19, 2015 01:48 PM UTC by Srikanth R
**Last Updated:** Mon Nov 02, 2015 09:23 AM UTC
**Owner:** nobody


*Setup*
Version : 4.6 FC
model : 2n
configuration : 1App,1SG,2SUs with 4comps each, 4SIs with 1 CSI each
si-si deps configured as SI1 is sponsor to SI2,3,&4.
SU1 is mapped to pl-3 and SU2 to pl-4
saAmfSGAutoRepair=1(True)
SuFailover=0(False)
component recovery policy - 3 (comp failover)

*Initial state*
All the AMF entities regarding the application are in unlocked states. SIs are 
in fully assigned state.

*Issue* SG is in unstable state ( standby csi removal timeout during sponsor si 
lock )

*Steps Performed* 

 -> Before performing lock operation of sponsor SI, ensured that component 1 in 
SU2 ( the standby SU) does not respond in CSI removal callback. 

 -> SG went to unstable state, after the lock operation of sponsor SI.



Below are the logs on PL-4 ( where standby SU is hosted ) :


Mar 19 19:05:11 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 
'safSi=SI1,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp'
Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 
'safSi=SI2,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp'
Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition 
failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI3,safApp=test2nApp
Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition 
failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI4,safApp=test2nApp


Below is the final state of SIs after the lock operation.


safSi=SI1,safApp=test2nApp
saAmfSIAdminState=LOCKED(2)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=SI2,safApp=test2nApp
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=SI3,safApp=test2nApp
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=SI4,safApp=test2nApp
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1756 AMF : amfd on controller asserted ( for CSI removal timeout during application si lock )

2016-04-13 Thread Srikanth R



---

** [tickets:#1756] AMF : amfd on controller asserted ( for CSI removal timeout 
during application si lock )**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 13, 2016 11:00 AM UTC by Srikanth R
**Last Updated:** Wed Apr 13, 2016 11:00 AM UTC
**Owner:** nobody
**Attachments:**

- 
[1755.tgz](https://sourceforge.net/p/opensaf/tickets/1756/attachment/1755.tgz) 
(681.2 kB; application/x-compressed-tar)


Changeset : 7436
Version : 5.0 FC
Setup :  Controller with 2 payloads. 
 2n red model with 2 SUs, 4 SIs and no si-si deps.
 
   
Steps performed :

-> Initially the application is brought up and all the SIs are fully assigned.

-> LPerformed shutdown operation on one of the SI .i.e SI4.

-> Ensured that application with active assignment shall time out in CSI 
removal callback. 

The shutdown operation timed out and the amfd on active controller asserted. 

Invoking admin operation SHUTDOWN on safSi=TestApp_SI4,safApp=TestApp_TwoN 
OP RETURN VALUE and AIS OP RETURN VAL =  5 -65536


Apr 13 16:17:40 CONTROLLER-2 osafamfd[2689]: sg_2n_fsm.cc:125: 
avd_su_fsm_state_determine: Assertion '0' failed.
Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: WA AMF director unexpectedly 
crashed
Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 6




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1748 MQSV service is not working ( Queue open does not succeed)

2016-04-11 Thread Srikanth R
- Description has changed:

Diff:



--- old
+++ new
@@ -86,4 +86,4 @@
 100|0| Return Value: SA_AIS_ERR_TIMEOUT
 
 
-3) 
+Traces of msgd, msgnd and test application are attached



- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1 @@
+mqsv.tgz (109.2 kB; application/x-compressed-tar)






---

** [tickets:#1748] MQSV service is not working ( Queue open does not succeed) **

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Tue Apr 12, 2016 03:49 AM UTC by Srikanth R
**Last Updated:** Tue Apr 12, 2016 03:49 AM UTC
**Owner:** nobody
**Attachments:**

- 
[mqsv.tgz](https://sourceforge.net/p/opensaf/tickets/1748/attachment/mqsv.tgz) 
(109.2 kB; application/x-compressed-tar)


Changeset : 7436 
Version : 5.0 FC


Issue : 

1) Queue opening fails with TRY_AGAIN / TIME_OUT.  

Below is the output of sample application :

DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START


MQSV:MQA:ONsaMsgQueueOpen failed with rc - 6
Press Enter Key to Continue...

Below is the output of test application with retry mechanism handled.

100|0| RETRY   : saMsgQueueOpen with valid params - Non Persistent
100|0| Return Value: SA_AIS_ERR_TRY_AGAIN
100|0|
100|0| Retry Count : 10
100|0| Retry Count : 20
100|0| Retry Count : 30
100|0| Retry Count : 40
100|0| Try again count exceeded


In the case of aysnc, queue open callback returns TRY_AGAIN.
 
100|0| Queue name  : safMq=nonpersistent_Q_37
100|0| size: 1000
100|0| creation flags  : SA_MSG_QUEUE_NON_PERSISTENT
100|0| open flags  : SA_MSG_QUEUE_CREATE
100|0| SUCCESS : saMsgQueueOpenAsync with valid parameters - Non 
Persistent
100|0| Return Value: SA_AIS_OK
100|0| Invocation  : 115
100|0|
100|0|
100|0| --- Queue Open Callback -
100|0| Error String  : SA_AIS_ERR_TRY_AGAIN
100|0| Invocation  : 115
100|0| ---

Below is the output in syslog .

Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER ERR_TRY_AGAIN: Timeout occurs 
Unable to send the respons in async case
Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER The procedure to open the 
Queue Failed with err 6
Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER Sending the message to the 
specified destination with error 6
Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER ERR_FAILED_OPERATION: Couldn't 
Send ASAPi Name Resolution Response Message



2) saMsgInitialize  and saMsgFinalize returning  TIME_OUT, which is not earlier 
observed on an idle system .


100|0| * Create a queue with zero retention time using saMsgQueueOpen *
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0| FAILED  : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_ERR_TIMEOUT
100|0|
100|0|
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0| SUCCESS : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_OK



100|0| Version : B.3.1
100|0| SUCCESS : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_OK
100|0| Message Handle  : 6876704
100|0| Version Output  : B.3.1

100|0| FAILED  : saMsgFinalize with all valid parameters
100|0| Return Value: SA_AIS_ERR_TIMEOUT


Traces of msgd, msgnd and test application are attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1748 MQSV service is not working ( Queue open does not succeed)

2016-04-11 Thread Srikanth R



---

** [tickets:#1748] MQSV service is not working ( Queue open does not succeed) **

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Tue Apr 12, 2016 03:49 AM UTC by Srikanth R
**Last Updated:** Tue Apr 12, 2016 03:49 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC


Issue : 

1) Queue opening fails with TRY_AGAIN / TIME_OUT.  

Below is the output of sample application :

DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START


MQSV:MQA:ONsaMsgQueueOpen failed with rc - 6
Press Enter Key to Continue...

Below is the output of test application with retry mechanism handled.

100|0| RETRY   : saMsgQueueOpen with valid params - Non Persistent
100|0| Return Value: SA_AIS_ERR_TRY_AGAIN
100|0|
100|0| Retry Count : 10
100|0| Retry Count : 20
100|0| Retry Count : 30
100|0| Retry Count : 40
100|0| Try again count exceeded


In the case of aysnc, queue open callback returns TRY_AGAIN.
 
100|0| Queue name  : safMq=nonpersistent_Q_37
100|0| size: 1000
100|0| creation flags  : SA_MSG_QUEUE_NON_PERSISTENT
100|0| open flags  : SA_MSG_QUEUE_CREATE
100|0| SUCCESS : saMsgQueueOpenAsync with valid parameters - Non 
Persistent
100|0| Return Value: SA_AIS_OK
100|0| Invocation  : 115
100|0|
100|0|
100|0| --- Queue Open Callback -
100|0| Error String  : SA_AIS_ERR_TRY_AGAIN
100|0| Invocation  : 115
100|0| ---

Below is the output in syslog .

Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER ERR_TRY_AGAIN: Timeout occurs 
Unable to send the respons in async case
Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER The procedure to open the 
Queue Failed with err 6
Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER Sending the message to the 
specified destination with error 6
Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER ERR_FAILED_OPERATION: Couldn't 
Send ASAPi Name Resolution Response Message



2) saMsgInitialize  and saMsgFinalize returning  TIME_OUT, which is not earlier 
observed on an idle system .


100|0| * Create a queue with zero retention time using saMsgQueueOpen *
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0| FAILED  : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_ERR_TIMEOUT
100|0|
100|0|
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0|
100|0| Version : B.3.1
100|0|MQSV:MQA:ON
100|0| SUCCESS : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_OK



100|0| Version : B.3.1
100|0| SUCCESS : saMsgInitialize with all valid parameters
100|0| Return Value: SA_AIS_OK
100|0| Message Handle  : 6876704
100|0| Version Output  : B.3.1

100|0| FAILED  : saMsgFinalize with all valid parameters
100|0| Return Value: SA_AIS_ERR_TIMEOUT


3) 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1586 pyosaf: Add high-level bindings for NTF

2016-04-07 Thread Srikanth R
- **status**: review --> fixed
- **Comment**:

[staging:882e50]
[staging:638ae5]

changeset:   7437:882e507b0fb9
parent:  7435:22c02f1a3644
user:Johan Mårtensson 
date:Thu Apr 07 17:31:19 2016 +0530
summary: pyosaf: Add NTF high-level bindings and sample applications [#1586]

changeset:   7438:638ae5046dab
branch:  opensaf-5.0.x
tag: tip
parent:  7436:fa81ab16c319
user:Johan Mårtensson 
date:Thu Apr 07 17:50:18 2016 +0530
summary: pyosaf: Add NTF high-level bindings and sample applications [#1586]




---

** [tickets:#1586] pyosaf: Add high-level bindings for NTF**

**Status:** fixed
**Milestone:** 5.0.FC
**Created:** Fri Nov 06, 2015 02:52 PM UTC by Johan Mårtensson
**Last Updated:** Fri Nov 06, 2015 02:59 PM UTC
**Owner:** Johan Mårtensson


There should be higher level bindings available for the NTF service, in 
addition to the one-to-one mappings. Sample applications are also needed to 
verify and demonstrate how to use the bindings.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1569 pyosaf: add sample script to scale opensaf configuration

2016-04-07 Thread Srikanth R
- **status**: review --> fixed
- **Comment**:

changeset:   7133:14ab197bb8ae
user:Rafael Odzakow 
date:Tue Nov 17 15:48:08 2015 +0100
files:   python/pyosaf/utils/immom/__init__.py 
python/pyosaf/utils/immom/object.py python/samples/scale_opensaf
description:
pyosaf: add sample script to scale opensaf configuration [#1569]




---

** [tickets:#1569] pyosaf: add sample script to scale opensaf configuration**

**Status:** fixed
**Milestone:** 5.0.FC
**Created:** Mon Oct 26, 2015 09:20 AM UTC by Rafael
**Last Updated:** Sun Nov 01, 2015 09:36 PM UTC
**Owner:** Rafael


Sample that uses the pyosaf API to scale-out/in nodes in the cluster. The 
script will create objects in IMM using a provided node as a template. It will 
create the AMF configuration objects needed for a node to be included in the 
cluster. It is assumed that the binaries are already installed on the new node. 
Object creation or deletion is done in one CCB.

Example scale-out: python scaling.py --hostname TEST


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1732 OUT_OF_SYNC (failed over) new active controller should go for immediate reboot

2016-04-06 Thread Srikanth R



---

** [tickets:#1732] OUT_OF_SYNC (failed over) new active controller should go 
for immediate reboot**

**Status:** unassigned
**Milestone:** 5.0.RC1
**Created:** Wed Apr 06, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 11:02 AM UTC
**Owner:** nobody


Changeset : 7436 
Version : 5.0 FC
Setup : Two controllers


Issue :
  Out of sync (failed over) new active controller should go for immediate 
reboot,
  
  During failover, if the standby controller is OUT OF SYNC and could not get 
promoted to active, the node should be rebooted immediately.
 
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
Apr  6 16:03:53 CONTROLLER-2 osafamfd[431]: ER avd_role_change role change 
failure
Apr  6 16:03:53 CONTROLLER-2 osafimmd[380]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
..
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: WA AMF director unexpectedly 
crashed
Apr  6 16:06:53 CONTROLLER-2 osafamfnd[441]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Apr  6 16:06:53 CONTROLLER-2 opensaf_reboot: Rebooting local node; timeout=60

This issue is fixed as part of  #1334, but might be observed because of #79


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1724 Opensaf is taking more time to start on active controller

2016-04-06 Thread Srikanth R



---

** [tickets:#1724] Opensaf is taking more time to start on active controller **

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 06, 2016 07:03 AM UTC by Srikanth R
**Last Updated:** Wed Apr 06, 2016 07:03 AM UTC
**Owner:** nobody


Setup :  single controller with opensaf changeset 7436 5.0 FC

Opensaf with 5.0 FC is taking more time  to start , when compared to opensaf 
4.7GA.

CLMNA is taking 5 seconds to promote the first node of the cluster to system 
controller and another two seconds to declare as active in 5.0. In 4.7,RDE 
takes 2 to three seconds to declare the first node as active.

Below is syslog on 5.0.


Apr  4 20:34:03 CONTROLLER-1 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  4 20:34:03 CONTROLLER-1 kernel: 
[22205.292335] 
...
Apr  4 20:34:03 CONTROLLER-1 kernel: [22205.297844] TIPC: Enabled bearer 
, discovery domain <1.1.0>, priority 10
Apr  4 20:34:03 CONTROLLER-1 osafclmna[10490]: Started
Apr  4 20:34:08 CONTROLLER-1 osafclmna[10490]: NO Starting to promote this node 
to a system controller
Apr  4 20:34:08 CONTROLLER-1 osafrded[10499]: Started
Apr  4 20:34:08 CONTROLLER-1 osaffmd[10508]: Started
Apr  4 20:34:08 CONTROLLER-1 osafimmd[10518]: logtrace: trace enabled to file 
/var/log/opensaf/osafimmd, mask=0x

Apr  4 20:34:08 CONTROLLER-1 osafrded[10499]: NO Requesting ACTIVE role
Apr  4 20:34:08 CONTROLLER-1 osafrded[10499]: NO RDE role set to Undefined
Apr  4 20:34:10 CONTROLLER-1 osafrded[10499]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
Apr  4 20:34:10 CONTROLLER-1 osafrded[10499]: NO Switched to ACTIVE from 
Undefined
Apr  4 20:34:10 CONTROLLER-1 osaffmd[10508]: NO Starting activation 
supervision: 30ms
Apr  4 20:34:10 CONTROLLER-1 osafimmnd[10529]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  4 20:34:10 CONTROLLER-1 osafimmd[10518]: IN node with dest ADDED 
564117257723920
Apr  4 20:34:10 CONTROLLER-1 osafimmnd[10529]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
...
Apr  4 20:34:10 CONTROLLER-1 osafimmd[10518]: NO Attached Nodes:1 Accepted 
nodes:0 KnownVeteran:0 doReply:1
Apr  4 20:34:10 CONTROLLER-1 osafimmd[10518]: NO First IMMND on SC found at 
2010f this IMMD at 2010f. Cluster is loading, *not* 2PBE => designating that 
IMMND as coordinator
Apr  4 20:34:10 CONTROLLER-1 osafimmnd[10529]: NO This IMMND is now the NEW 
Coord
Apr  4 20:34:10 CONTROLLER-1 osafimmnd[10529]: NO SETTING COORD TO 1 CLOUD PROTO
Apr  4 20:34:13 CONTROLLER-1 osafimmnd[10529]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
...
Apr  4 20:34:14 CONTROLLER-1 osafamfnd[10590]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Apr  4 20:34:14 CONTROLLER-1 opensafd: OpenSAF(5.0.FC - ) services successfully 
started


Below is the syslog output for startup on opensaf 4.7 FC.

Apr  5 15:39:50 CONTROLLER-2 opensafd: Starting OpenSAF Services(4.7.0 - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  5 15:39:50 CONTROLLER-2 kernel: [ 
5024.923512] TIPC: Activated (version 2.0.0)
Apr  5 15:39:50 CONTROLLER-2 kernel: [ 5024.923603] NET: Registered protocol 
family 30
...
Apr  5 15:39:50 CONTROLLER-2 kernel: [ 5024.930263] TIPC: Enabled bearer 
, discovery domain <1.1.0>, priority 10
Apr  5 15:39:50 CONTROLLER-2 osafrded[4071]: Started
Apr  5 15:39:52 CONTROLLER-2 osafrded[4071]: NO No peer available => Setting 
Active role for this node
Apr  5 15:39:52 CONTROLLER-2 osaffmd[4080]: Started
Apr  5 15:39:52 CONTROLLER-2 osafimmd[4090]: Started
Apr  5 15:39:52 CONTROLLER-2 osafimmnd[4101]: Started
Apr  5 15:39:53 CONTROLLER-2 osafimmd[4090]: NO New IMMND process is on ACTIVE 
Controller at 2020f
Apr  5 15:39:53 CONTROLLER-2 osafimmd[4090]: NO First SC IMMND (OpenSAF 4.4 or 
later) attached 2020f
...
Apr  5 15:39:53 CONTROLLER-2 osafimmnd[4101]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING

Apr  5 15:39:56 CONTROLLER-2 osafimmnd[4101]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
...
Apr  5 15:39:57 CONTROLLER-2 osafamfnd[4168]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Apr  5 15:39:57 CONTROLLER-2 opensafd: OpenSAF(4.7.0 - ) services successfully 
started



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1699 EVT : deadlock in saEvtFinalize after CLM unlock

2016-03-10 Thread Srikanth R



---

** [tickets:#1699] EVT : deadlock in saEvtFinalize after CLM unlock **

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Thu Mar 10, 2016 10:38 AM UTC by Srikanth R
**Last Updated:** Thu Mar 10, 2016 10:38 AM UTC
**Owner:** nobody


Changeset : 6901 ( 4.7 GA )
 
  * Initially EVT application is spawned on a payload and only saEvtInitialize 
is called. 
  * Now CLM lock operation and later CLM unlock operation is issued, for which 
operation succeeded.
  *  saEvtSelectionObjectGet and other apis are returning SA_AIS_OK with the 
handle obtained before CLM lock.
  * Deadlock in saEvtFinalize call  is observed after CLM unlock. 

(gdb) thread apply all bt

Thread 3 (Thread 0x7f67a38dab00 (LWP 4286)):
0  0x7f67a275e4f6 in poll () from /lib64/libc.so.6
1  0x7f67a1dc9d61 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0
2  0x7f67a1dd134f in ncs_tmr_wait () from /usr/lib64/libopensaf_core.so.0
3  0x7f67a2c847b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f67a27679cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 2 (Thread 0x7f67a38a9b00 (LWP 4287)):
0  0x7f67a275e4f6 in poll () from /lib64/libc.so.6
1  0x7f67a1e05f3e in mdtm_process_recv_events () from 
/usr/lib64/libopensaf_core.so.0
2  0x7f67a2c847b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f67a27679cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 1 (Thread 0x7f67a38ac700 (LWP 4285)):
0  0x7f67a2c8aa00 in sem_wait () from /lib64/libpthread.so.0
1  0x7f67a1dd9fd2 in hm_block_me () from /usr/lib64/libopensaf_core.so.0
2  0x7f67a1dda14d in ncshm_destroy_hdl () from 
/usr/lib64/libopensaf_core.so.0
3  0x7f67a34b7b44 in eda_hdl_rec_del () from /usr/lib64/libSaEvt.so.1
4  0x7f67a34b1df5 in saEvtFinalize () at eda_saf_api.c:444
5  0x0040e445 in edsv_err_unavail_02_node02 ()




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1663 pyosaf: immom utils are sometimes unintuitive

2016-01-27 Thread Srikanth R
- **status**: review --> fixed
- **Comment**:

[staging:16ed58]
changeset:   7261:16ed58c89d4d
tag: tip
user:Johan Mårtensson 
date:Wed Jan 27 16:45:54 2016 +0530
summary: pyosaf: Fix Ccb and ImmObject classes to handle more inputs [#1663]




---

** [tickets:#1663] pyosaf: immom utils are sometimes unintuitive**

**Status:** fixed
**Milestone:** 5.0.FC
**Created:** Fri Jan 15, 2016 04:31 PM UTC by Johan Mårtensson
**Last Updated:** Tue Jan 19, 2016 07:16 AM UTC
**Owner:** Johan Mårtensson


There are (at least) two usability problems in the immom utils.

 - The Ccb methods modify_value_[replace|add|delete] require the value of the 
attribute to be passed as a list even if the value is an atom. This is an easy 
misake to make and it's not obvious how to fix it.

 - When creating an ImmObject, the rdn attribute needs to be assigned a value 
that includes the name of the rdn attribute (obj.rdnAttribute = 
"rdnAttribute=23"). This is unintuitive. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1603 AMF : Cold sync takes 10 seconds if all the nodes in the cluster are started simultaneously

2015-11-18 Thread Srikanth R



---

** [tickets:#1603] AMF : Cold sync takes 10 seconds if all the nodes in the 
cluster are started  simultaneously**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Thu Nov 19, 2015 06:36 AM UTC by Srikanth R
**Last Updated:** Thu Nov 19, 2015 06:36 AM UTC
**Owner:** nobody


Changeset : 4.7 GA 7071
Setup : 4 nodes (bare-metal)

Steps  :

-> Enabled the 1 PBE with out any load on the cluster and stopped opensaf on 
all the nodes.

-> When opensaf is started on all the four nodes parallely, opensafd on the 
standby controller is taking more than 10 seconds to join cluster.  Opensafd on 
the other nodes took similar normal time,as during sequential bringup

..
Nov 19 11:40:40 SYSTEST-CNTLR-2 osafamfd[20951]: Started
...
Nov 19 11:40:52 SYSTEST-CNTLR-2 osafamfd[20951]: NO Cold sync complete!
...
Nov 19 11:40:53 SYSTEST-CNTLR-2 osafamfnd[20965]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Nov 19 11:40:53 SYSTEST-CNTLR-2 opensafd: OpenSAF(4.7.M0 - ) services 
successfully started

-> Following is the snippet from the osafamfd trace :

Nov 19 11:40:42.874133 osafamfd [20951:mbcsv_pr_evts.c:0278] << 
mbcsv_hdl_dispatch_all
Nov 19 11:40:42.874139 osafamfd [20951:mbcsv_api.c:0435] << 
mbcsv_process_dispatch_request: retval: 1
Nov 19 11:40:49.859697 osafamfd [20951:mbcsv_tmr.c:0250] TR Timer expired. my 
role:2, svc_id:10, pwe_hdl:65537, peer_anchor:564114215690274, tmr 
type:NCS_MBCSV_TMR_SEND_COLD_SYNC
Nov 19 11:40:49.859813 osafamfd [20951:mbcsv_api.c:0406] >> 
mbcsv_process_dispatch_request: Dispatch MBCSV event


 Observation : This issue is there in 4.6 version also. Because of the #1334 
fix ( AMFD responds to nid only after initialization is completed ), the issue 
got visible .  Below is the syslog from standby controller with 4.6 version 
when opensafd is started on all the nodes parallelly
 
 Apr 22 14:04:54 SLES-SLOT2 osafamfnd[20334]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Apr 22 14:04:55 SLES-SLOT2 opensafd: OpenSAF(4.6.FC - ) services successfully 
started
Apr 22 14:05:05 SLES-SLOT2 osafamfd[20324]: NO Cold sync complete!



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1589 EVT : Segfault in saEvtEventDataGet in multithreaded app

2015-11-09 Thread Srikanth R
- **summary**: EVT : Segfault in saEvtEventDataGet in event delivery callback 
--> EVT : Segfault in saEvtEventDataGet in multithreaded app



---

** [tickets:#1589] EVT : Segfault in saEvtEventDataGet in multithreaded app**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Tue Nov 10, 2015 01:22 AM UTC by Srikanth R
**Last Updated:** Tue Nov 10, 2015 01:22 AM UTC
**Owner:** nobody


Changeset : 7071
Application : EDSV multi threaded application with multiple publisher threads 
and single subscriber thread.


Steps :

   -> Each publisher thread creates a channel and waits for the subscriber. 
   ->The subscriber thread comes up and subscribes to all the channels created 
by the publishers.
   -> Now all the publishers publish the event.
   -> In the event deliver callback, application segfaulted for the 
saEvtEventDataGet call.
   -> Below is the back trace
   
 0  0x775a6224 in saEvtEventDataGet (eventHandle=4289724417, 
eventData=0x7fffde30, eventDataSize=0x7fffde28) at eda_saf_api.c:1944
1  0x0040113b in evtDeliverCallback (subscriptionId=4, 
eventHandle=4285530146, eventDataSize=20) at multithread/eda_thread1.c:25
2  0x775a9ed0 in eda_hdl_cbk_rec_prc (cb=0x6260c0, msg=0x6279f0, 
reg_cbk=0x6268e0) at eda_hdl.c:691
3  0x775aa20d in eda_hdl_cbk_dispatch_all (cb=0x6260c0, 
hdl_rec=0x6268d0) at eda_hdl.c:836
4  0x775a9d85 in eda_hdl_cbk_dispatch (cb=0x6260c0, hdl_rec=0x6268d0, 
flags=SA_DISPATCH_ALL) at eda_hdl.c:641
5  0x775a1e5a in saEvtDispatch (evtHandle=4289724417, 
dispatchFlags=SA_DISPATCH_ALL) at eda_saf_api.c:351
6  0x0040194d in subscriber_loop (thread_number=1) at 
multithread/eda_thread1.c:213
7  0x00401b64 in main (argc=1, argv=0x7fffe398) at 
multithread/eda_thread1.c:271
(gdb) p *evt_hdl_rec 
$2 = {event_hdl = 1, priority = 1 '\001', retention_time = 66370, publish_time 
= 140737488348072, publisher_name = {length = 4316, 
value = 
"@\000\000\000\000\000\360\242c\000\000\000\000\000\001\000\240\377", '\000' 
, 
"!\000\000\000\000\000\000\000\002\000\000\000\377\177\000\000خY\367\377\177\000\000\000\000\000\000\000\000\000\000!\001\000\000\000\000\000\000\017",
 '\000' , "\001", '\000' "\230, 
\266\371\366\377\177\000\000\001", '\000' }, pattern_array = 
0x0, event_data_size = 0, evt_data = 0x0, evt_type = 0 '\000', parent_chan = 
0x0, next = 0x0, pub_evt_id = 0, 
  del_evt_id = 0}

   
   
   


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1451 pyosaf: Update sample applications to use immom utils instead of direct bindings

2015-11-09 Thread Srikanth R
- **status**: review --> fixed
- **Comment**:

[staging:a44b87]
changeset:   7100:a44b87201911
tag: tip
user:Johan Mårtensson 
date:Tue Nov 10 08:45:36 2015 +0530
summary: pyosaf: Update sample applications to use immom utils instead of 
direct bindings [#1451]




---

** [tickets:#1451] pyosaf: Update sample applications to use immom utils 
instead of direct bindings**

**Status:** fixed
**Milestone:** 5.0.FC
**Created:** Fri Aug 14, 2015 08:47 AM UTC by Johan Mårtensson
**Last Updated:** Sun Nov 01, 2015 09:36 PM UTC
**Owner:** Johan Mårtensson


The sample applications for pyosaf uses the direct IMM bindings instead of the 
higher level ones. To guide users looking for code samples they should instead 
show how to use the much-easier-to-use immom utils.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


  1   2   >