[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag

2016-09-15 Thread Long HB Nguyen
- **status**: accepted --> review



---

** [tickets:#1991] AMF: Existing PG tracking should not be stopped  for CURRENT 
flag**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R
**Last Updated:** Wed Sep 14, 2016 04:32 AM UTC
**Owner:** Long HB Nguyen


5.1.FC : changeset - 6997

Issue : Existing PG tracking should not be stopped  for CURRENT call


Steps performed :

-> Call saAmfInitialize_4()
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrackStop()


Observed output :

TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started 
earlier. 


Expected output:

   TrackStop() api should  return SA_AIS_OK and in the earlier release, api is 
returning SA_AIS_OK.
 
 According to the B04.01 spec 7.11.1 page 318 ,  Tracking should not be stopped 
untill TrackStop() is called explicitly.

Once saAmfProtectionGroupTrack_4() has been called with trackFlags
containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification
callbacks can only be stopped by an invocation of
saAmfProtectionGroupTrackStop().



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2038 NTF: Using sizeof(string) for lengthAdditionalText results in SA_AIS_ERR_INVALID_PARAM

2016-09-15 Thread Minh Hon Chau
- Description has changed:

Diff:



--- old
+++ new
@@ -1,3 +1,3 @@
-After patch of #2006, now if ntf client uses sizeof(string) to calculate the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.
+After patch of #2006, now if ntf client uses sizeof(string) + 1 to specify the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.
 
 Some existing ntftests have failed because those tests are still using sizeof, 
which could be fixed in ntftests. However, that could be a *complaint* from 
real applications which have been running without problem (application's code 
are using sizeof())






---

** [tickets:#2038] NTF: Using sizeof(string) for lengthAdditionalText results 
in SA_AIS_ERR_INVALID_PARAM**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Thu Sep 15, 2016 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 15, 2016 10:56 PM UTC
**Owner:** Minh Hon Chau


After patch of #2006, now if ntf client uses sizeof(string) + 1 to specify the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.

Some existing ntftests have failed because those tests are still using sizeof, 
which could be fixed in ntftests. However, that could be a *complaint* from 
real applications which have been running without problem (application's code 
are using sizeof())


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2038 NTF: Using sizeof(string) for lengthAdditionalText results in SA_AIS_ERR_INVALID_PARAM

2016-09-15 Thread Minh Hon Chau
- **status**: assigned --> review



---

** [tickets:#2038] NTF: Using sizeof(string) for lengthAdditionalText results 
in SA_AIS_ERR_INVALID_PARAM**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Thu Sep 15, 2016 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 15, 2016 11:23 AM UTC
**Owner:** Minh Hon Chau


After patch of #2006, now if ntf client uses sizeof(string) to calculate the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.

Some existing ntftests have failed because those tests are still using sizeof, 
which could be fixed in ntftests. However, that could be a *complaint* from 
real applications which have been running without problem (application's code 
are using sizeof())


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-09-15 Thread Srikanth R
- **summary**: saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover --> ckpt : saCkptCheckpointOpen api 
call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
- **Comment**:

Application output with syslog running as background process.

Sep 15 18:46:10 SYSTEST-PLD-1 kernel: [ 1204.300498] TIPC: Established link 
<1.1.3:eth3-1.1.2:eth3> on network plane A
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19001
Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 4 in ImmModel
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 14 
(MsgQueueService131599) <0, 2020f>
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 15 (@safAmfService2020f) <0, 2020f>
Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 16 (@OpenSafImmReplicatorB) <0, 2020f>
SYSTEST-PLD-1:/home//cpsv_fo #
***
Demonstrating Checkpoint Service Usage with a collocated Checkpoint
***
Initialising With Checkpoint Service
Sep 15 18:46:13 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file 
/home//cpsv_fo/ckpt.trace, mask=0x
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService with 
create flags
PASSED
Press  key to continue... Invoke failover
Sep 15 18:46:51 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 8 
<0, 2010f> (safEvtService)

Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704238] TIPC: Resetting link 
<1.1.3:eth3-1.1.1:eth0>, peer not responding
Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704251] TIPC: Lost link 
<1.1.3:eth3-1.1.1:eth0> on network plane A
Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704259] TIPC: Lost contact with 
<1.1.1>
...
...
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 23 
(safEvtService) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 24 (@safLogService_appl) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 25 
(safSmfService) <0, 2020f>
Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 26 (@OpenSafImmReplicatorA) <0, 2020f>
**Unlink My Checkpoint    Failed :5**
Ckpt Finalize being called  PASSED
SYSTEST-PLD-1:/home//cpsv_fo # Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: 
NO Implementer connected: 27 (MsgQueueService131343) <0, 2020f>
Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 27 
<0, 2020f> (MsgQueueService131343)
Sep 15 18:47:18 SYSTEST-PLD-1 kernel: [ 1272.242604] TIPC: Established link 
<1.1.3:eth3-1.1.1:eth0> on network plane A
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_R_AVAILABLE
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19001
Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 5 in ImmModel
Sep 15 18:47:20 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 28 
(MsgQueueService131343) <0, 2010f>
Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 29 (@safAmfService2010f) <0, 2010f>
Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) 
connected: 30 (@OpenSafImmReplicatorB) <0, 2010f>

SYSTEST-PLD-1:/home//cpsv_fo # ./a.out
***
Demonstrating Checkpoint Service Usage with a collocated Checkpoint
***
Initialising With Checkpoint Service
Sep 15 18:48:24 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file 
/home//cpsv_fo/ckpt.trace, mask=0x
PASSED
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService
**Ckpt open Failed (2).** Hence exiting




---

** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 15, 2016 01:27 PM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << 

[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-09-15 Thread Srikanth R
Hi Pham,

   We had applied the patch on 5.0 GA and observed that the issue is still 
observed.

Below are the steps and the apis used in the application to reproduce the issue.

Application :

-> Invoke saCkptInitialize
-> Invoke saCkptCheckpointOpen with create flag and 
SA_CKPT_WR_ACTIVE_REPLICA_WEAK.
-> Invoke saCkptCheckpointOpen with WRITE flag
-> Wait for user to press enter ( to invoke failover )
-> Invoke saCkptCheckpointUnlink
-> Invoke saCkptFinalize

Steps to reproduce the issue :

-> Initially start a single controller and payload.

-> Start the other controller, which shall join as standby.

-> Once the standby controller is joining, invoke the application on the 
payload. This is such a way that the CKPT apis shall be invoked when CKPT cold 
sync is in progress.

->  After a sleep of 20 seconds, induce middle failover and later unblock the 
application after which unlink and finalize apis shall be invoked.  

 The unlink api returns TIME_OUT and the IMM objects are not deleted from the DB
 
 immfind | grep -i Demo
safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService

 -> If this application is invoked next time, checkpoint open shall return 
SA_AIS_ERR_LIBRARY.
 
 
 -> At this stage, if the application is invoked twice, ckptd segfaults and the 
ticket #2011 is raised regarding that.

  This issue (#1765) seems to be similar as #247, which has been closed as 
non-reproducible.  Some times, checkpoint open also gets SA_AIS_ERR_RESOURCES 
as mentioned in #247. 
  
  
  -- Srikanth


Attachments:

- 
[1765.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/8ea9d424/d730/attachment/1765.tgz)
 (111.5 kB; application/x-compressed-tar)


---

** [tickets:#1765] saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Wed May 04, 2016 06:56 PM UTC
**Owner:** Pham Hoang Nhat
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: 
retval = 1
Apr 15  8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with 
return value:2,ckptHandle:63
Apr 15  8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: 
API return code = 2**

> Traces of both controllers and agent trace of payload is attached.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2017 Update the SMF PR document with information about faster upgrade

2016-09-15 Thread Rafael
Document for review


Attachments:

- 
[OpenSAF_SMFSv_PR_2.odt](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/073b9a04/8f1d/attachment/OpenSAF_SMFSv_PR_2.odt)
 (430.0 kB; application/vnd.oasis.opendocument.text)


---

** [tickets:#2017] Update the SMF PR document with information about faster 
upgrade**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Fri Sep 09, 2016 12:22 PM UTC by elunlen
**Last Updated:** Tue Sep 13, 2016 10:05 AM UTC
**Owner:** elunlen


Update the SMF PR document with information about:
* Balanced In Service Upgrade (BISU) [#1685]
* Parallel swBundle removal and installation [#1633]
* NG lock and unlock in single step upgrade [#1634]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1890 Doc : Headless feature documentation

2016-09-15 Thread Mathi Naickan
- **status**: unassigned --> accepted
- **assigned_to**: Mathi Naickan
- **Milestone**: 5.0.1 --> 5.1.RC2



---

** [tickets:#1890] Doc : Headless feature documentation **

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Tue Jun 21, 2016 11:02 AM UTC by Srikanth R
**Last Updated:** Tue Jun 21, 2016 11:02 AM UTC
**Owner:** Mathi Naickan


Version : Opensaf 5.0. GA


 1) Documentation about headless feature should be updated in 
Opensaf_Overview_PR.odt / Opensaf_Extentsions. The documentation should list 
out services which provide functionality, when the cluster goes headless.
  
 2) The  README.HYDRA file in the ntfsv folder should be renamed to 
README.HEADLESS for uniformity in naming the files across all the folders.
 
 3) CLM folder doesn't have README for the headless feature.
 
 4) The headless files across all folders should have same naming convention.

./osaf/services/saf/amf/README_HEADLESS
./osaf/services/saf/logsv/README-HEADLESS
./osaf/services/saf/cpsv/README.HEADLESS





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))

2016-09-15 Thread Praveen
- **status**: assigned --> accepted
- Attachments has changed:

Diff:



--- old
+++ new
@@ -1 +1,3 @@
 createAppTestApp.sh (15.8 kB; text/x-shellscript)
+messages (11.2 kB; application/octet-stream)
+osafamfd (452.8 kB; application/octet-stream)



- **Milestone**: 4.7.2 --> 5.1.RC2
- **Comment**:

With the configuration  provided by the creater, issue cannot be reproduced as 
comps recovery is comprestart.
However, based on bt provided, issue can be reproduced with following steps:
1)Change the configuration provided by the ticket creater by hosting SU1 on 
SC-2 and other SUs on SC-1. Enable disableRestart flag for all the comps.
2)Bring the configuration up. SI1 will become acitve in SU1 on SC-2 and standby 
in SU2 on SC-1 .
3)Create a nodegroup with SC-1 and SC-2.
4)Lock the NG, delay response for removal callback for SI1 on SU2. Respond for 
quiesced callback for SI1 on SU1 but delay removal callback.
5)kill comp in SU1 that has delayed removal of Si1.
6)AMFD will try to failover of SI1 but it could not find any In-service. It 
tries to send again removal of assigments to SU2 and asserts.





---

** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout 
- Nway model))**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R
**Last Updated:** Thu Sep 15, 2016 05:41 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2022/attachment/messages) 
(11.2 kB; application/octet-stream)
- 
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2022/attachment/osafamfd) 
(452.8 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4


Summary :
--
AMFD on both controllers asserted, if Nway application failed in CSI SET 
QUIESCED callback in lock operation of node group 


Steps followed & Observed behaviour
--

-> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the 
application. Configuration is attached to the ticket.
-> Created a node group with all the three nodes.
-> Ensured that one of component will not respond to quiesced callback
-> Now performed the lock operation on the node group
-> amfd on both controllers asserted with the following back trace.


0  0x7f66fbc6fb55 in raise () from /lib64/libc.so.6
1  0x7f66fbc71131 in abort () from /lib64/libc.so.6
2  0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", 
__line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f 
"saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281

3  0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022
4  0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783
5  0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714
6  0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, 
su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022
7  0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:3493
8  0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:497
9  0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525
10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, 
evt=0x7f66f4002940) at sgproc.cc:838
11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at 
main.cc:768
12 0x004508cd in main_loop () at main.cc:689
13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2038 NTF: Using sizeof(string) for lengthAdditionalText results in SA_AIS_ERR_INVALID_PARAM

2016-09-15 Thread Minh Hon Chau
- **status**: unassigned --> assigned
- **assigned_to**: Minh Hon Chau



---

** [tickets:#2038] NTF: Using sizeof(string) for lengthAdditionalText results 
in SA_AIS_ERR_INVALID_PARAM**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Thu Sep 15, 2016 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 15, 2016 08:47 AM UTC
**Owner:** Minh Hon Chau


After patch of #2006, now if ntf client uses sizeof(string) to calculate the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.

Some existing ntftests have failed because those tests are still using sizeof, 
which could be fixed in ntftests. However, that could be a *complaint* from 
real applications which have been running without problem (application's code 
are using sizeof())


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1968 SMF does not handle AMF long DN support

2016-09-15 Thread elunlen
- **status**: review --> fixed
- **assigned_to**: elunlen -->  nobody 
- **Comment**:

changeset:   8083:428d177a3756
tag: tip
parent:  8081:e8ddfcd67a2f
user:Lennart Lund 
date:Thu Sep 15 12:43:37 2016 +0200
summary: smf: Adapt SMF to use AMF long DN support [#1968]

rev: 428d177a3756ccdc92112b60f3f238da405ce3e4

changeset:   8082:48e1e4a2362d
branch:  opensaf-5.1.x
parent:  8080:119b72d81883
user:Lennart Lund 
date:Thu Sep 15 12:43:37 2016 +0200
summary: smf: Adapt SMF to use AMF long DN support [#1968]

rev: 48e1e4a2362dc33f01db674ed45759b7f6c6533d



---

** [tickets:#1968] SMF does not handle AMF long DN support**

**Status:** fixed
**Milestone:** 5.1.RC2
**Created:** Wed Aug 24, 2016 12:51 PM UTC by elunlen
**Last Updated:** Tue Sep 13, 2016 10:11 AM UTC
**Owner:** nobody


SMF already supports long DN. However there are some checks regarding AMF 
related objects that does not allow some DN to be longer than 255 (RDN 64). 
These tests shall be removed since AMF will support long DN from 5.1


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2005 smfd: Inconsistent reading of settings

2016-09-15 Thread elunlen
- **status**: unassigned --> invalid
- **Comment**:

Handling keep du state IS working since init state is saved persistent in IMM



---

** [tickets:#2005] smfd: Inconsistent reading of settings**

**Status:** invalid
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 02:35 PM UTC by elunlen
**Last Updated:** Tue Sep 06, 2016 02:35 PM UTC
**Owner:** nobody


SMF reads IMM settings and upfdate its cb globals when assigend active, in oi 
apply callback and after executed init actions in campaign init state. This 
gives a problem with strange behaviour regarding when IMM settings are updated.
Example:
Before executing a campaign that has a long campaign name (> 255 characters) 
longDnsAllowed and smfKeepDuState shall be changed before start executing the 
campaign.
1.
If smfKeepDuState is changed before longDnsAllowed the campaing will fail 
because cb globals are not updated after change of longDnsAllowed
2.
If longDnsAllowed is changed before smfKeepDuState is changed the campaign will 
succeed because cb will be update with the new longDnsAllowed setting when the 
OI apply callback is called when smfKeepDuState is changed


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1969 smf: One step upgrade with cluster reboot does not wait for nodes to start

2016-09-15 Thread elunlen
After some more investigation:
SMF should not have to wait for all nodes in order to change admin state. If 
admin state is changed for a node that is not yet started or part of the 
cluster it should be handled according to the set admin state when it comes up.


---

** [tickets:#1969] smf: One step upgrade with cluster reboot does not wait for 
nodes to start**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Wed Aug 24, 2016 01:01 PM UTC by elunlen
**Last Updated:** Tue Sep 13, 2016 11:09 AM UTC
**Owner:** nobody


When using the one step upgrade feature with a cluster reboot all nodes will 
restart including the SC-nodes. This is done as the last action in the upgrade 
step. After the active SC-node is up again SMF will continue with the procedure 
wrapup. When collecting information in order to prepare the wrapup the node 
destination for all nodes in the campaign is requested. However this 
information can only be collected from nodes that are started and has joined 
the cluster (unlocked).
The problem is that SMF does not seems wait in order to give all nodes a chance 
to join the cluster and if SMF fails to get node destination from any of the 
nodes the campaign will fail as seen in the log below. When reading node 
destination there is a 10 sec “try again” loop waiting for “node up” for each 
node. It is not unlikely that the active SC-node comes up before some of the 
other nodes and that it will take more than 10 sec after that before some of 
the other nodes joins the cluster. If that's the case the campaign will fail


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-15 Thread Neelakanta Reddy
immd traces are not availabel when the assertion is happened:
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.

can you check, that the system resources are not used completely, like hard 
disk (check space si full).
looks, like memory corruption problems also. 

keep, sufficent resources and try to run the test again.



---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 07:19 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 

[tickets] [opensaf:tickets] #2029 imm: fevs message lost during failover

2016-09-15 Thread Hung Nguyen
- Description has changed:

Diff:



--- old
+++ new
@@ -23,11 +23,15 @@
 ...
 ~~~
 
+The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message 
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().
+
+If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2 
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2 
will be discarded by IMMNDs, no problem here.
+But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this 
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs, 
that will cause fevs message loss.
 
 Details of the problem is explained here
 
 
-http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjAKFElnhCTUywCYDxo5EUN0A5LAZzzzACMB7AD2S8AbgFMw5bDgA0TKgC45OecgDKAFQCCrAEIBNZAFpJC5JoDC61ADUAogB0EjgGajhbZDF4BXJOORQHjgADAAsAMwAbJxMrB6GAMRgogAmAHwmlCqu7gD6ALaibGwgAOainDwCQmISclk5Hl6+EP6ByCERAOyVfIIi-vXyAFSjbKIIKcj53DDTbKXIzrwSjfOLneFdjtzeKFAoXoUeELwmOMgANiCtMRSURgncl96iGbHs2W5sBQvIbBAQPlgKlkAB3A4ACw6YS2vWqAzqFGUqghEBg0NOZksNls-0BrUcOz2yEhIDYCAA5ChSrwUBBIaJprN1kswLx8pkiQg2GcGUy1s0-BJ2gCoJdLjCItE8B94klUu9kV88scSuV4f1aud5IKfMKAkFYdsnAhuKIYCBvONzvjxZKyRTqchkjBRFAxFN+cy5vkFtJHAd8UDgCdGUtvqy0dDNibHJoYBBvCAJQBPaTIb1rP2LNiQnyXKbm4PA0HRmHhUIADjuUkez1eSpYnwjqr+AJDZahUrhXD6NUGmDiiiH7ACpQQKyZKW8wEusBuoOzfwAFLGAJTc3mZ8NqspM2Nsjm29qXXgA5AAQivN+vd9vfYR2qUI1GrvdYh9rOWq0jOZ7cYIOo4Z6i0bQeCmyQgCkqYAdyADqmjIOgRTqkyQoQPIh4ANQdFeAC8AF4EAA
+http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjACgGUAVAIQEoAoUSWeEJNTLAJjwEEBhIy66ORFBnQA5LAGcKFMACMA9gA9ksgG4BTMI2z5i5ADRCW7LmQBcMAK5gwqhgDNVysQRsATDrPNITyHAAYALADMAGySMgpKahpComImMVhKCMgEHMzIUGLILrIA7ghhcooq6pq4hKSmwhwE2AQA+lgA8gDqwsgONhCSkgbalQC0AMTWLgB8CXEsoo2oqWwASlj1wk1YAKLIYhAgALbAqi7IuVAQABY+AYEA7L1MrJzcADxD0gA25qoDkyaizMtYOYcRbLDAABQAMshbLINAABJoHBAEEC2VC7XZgkjrQoRErRe5GbgmeyOZwINweBiZZAIWQoczAFwgCCHZAQWQAc1U51KJ3OZRwAB0ECKxLIMihtntgFlechdqoxGIQNzjqcLn4grcKAYHsZhqMJphYiZpgCgSD6uCodL9mz+Zqrjq+hVyC9OdYbN9CY9TOgSBwxMoFUqVWqYRotTdccUooK3aYwWAoAwPCh5blwAhU5yRSKWmxkOgw6rVMgYFSICZo9dkABqHzIACEAF5LtqeuE46UfkQzuXJtlMjBwEdzbN5ktrehIaHlWX8whpKpR+YxOXZLZsoy3rAWWzSVkEOZdiuwEv+yyAORZXJnACeyARSJRaIxWM2NLpKFs5jebxPi4I5jocsaRL2vrGL8NR1I0rTtJ0SCXmcNJISglaKnKEp6sgbwHho5z0IKQA
 
 
 ~~~



- **status**: unassigned --> accepted
- **assigned_to**: Hung Nguyen
- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1 @@
+logs.7z (250.9 kB; application/octet-stream)






---

** [tickets:#2029] imm: fevs message lost during failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Tue Sep 13, 2016 11:05 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2029/attachment/logs.7z) 
(250.9 kB; application/octet-stream)


There's fevs message loss when failing over between 2 SCs.


~~~
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 
2010f> (OsafImmPbeRt_B)
~~~


The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that 
applier keeps being mark as dying


~~~
Sep  8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
Sep  8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
~~~

The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message 
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().

If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2 
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2 
will be discarded by IMMNDs, no problem here.
But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this 
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs, 
that will cause fevs message loss.

Details of the problem is explained here



[tickets] [opensaf:tickets] #2029 imm: fevs message lost during failover

2016-09-15 Thread Hung Nguyen
- Attachments has changed:

Diff:



--- old
+++ new
@@ -1 +0,0 @@
-logs.7z (256.4 kB; application/octet-stream)






---

** [tickets:#2029] imm: fevs message lost during failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Tue Sep 13, 2016 11:05 AM UTC
**Owner:** nobody


There's fevs message loss when failing over between 2 SCs.


~~~
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 
2010f> (OsafImmPbeRt_B)
~~~


The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that 
applier keeps being mark as dying


~~~
Sep  8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
Sep  8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
~~~


Details of the problem is explained here


http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjAKFElnhCTUywCYDxo5EUN0A5LAZzzzACMB7AD2S8AbgFMw5bDgA0TKgC45OecgDKAFQCCrAEIBNZAFpJC5JoDC61ADUAogB0EjgGajhbZDF4BXJOORQHjgADAAsAMwAbJxMrB6GAMRgogAmAHwmlCqu7gD6ALaibGwgAOainDwCQmISclk5Hl6+EP6ByCERAOyVfIIi-vXyAFSjbKIIKcj53DDTbKXIzrwSjfOLneFdjtzeKFAoXoUeELwmOMgANiCtMRSURgncl96iGbHs2W5sBQvIbBAQPlgKlkAB3A4ACw6YS2vWqAzqFGUqghEBg0NOZksNls-0BrUcOz2yEhIDYCAA5ChSrwUBBIaJprN1kswLx8pkiQg2GcGUy1s0-BJ2gCoJdLjCItE8B94klUu9kV88scSuV4f1aud5IKfMKAkFYdsnAhuKIYCBvONzvjxZKyRTqchkjBRFAxFN+cy5vkFtJHAd8UDgCdGUtvqy0dDNibHJoYBBvCAJQBPaTIb1rP2LNiQnyXKbm4PA0HRmHhUIADjuUkez1eSpYnwjqr+AJDZahUrhXD6NUGmDiiiH7ACpQQKyZKW8wEusBuoOzfwAFLGAJTc3mZ8NqspM2Nsjm29qXXgA5AAQivN+vd9vfYR2qUI1GrvdYh9rOWq0jOZ7cYIOo4Z6i0bQeCmyQgCkqYAdyADqmjIOgRTqkyQoQPIh4ANQdFeAC8AF4EAA


~~~
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2 
detected at standby immd!! 1. Possible failover
...
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message 
type 82 - ignoring
~~~


Attached is the logs


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.

2016-09-15 Thread Praveen
- **status**: accepted --> review



---

** [tickets:#1990] AMF :  Extra notification is received for lock operation on 
unlocked SG.**

**Status:** review
**Milestone:** 4.7.2
**Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R
**Last Updated:** Tue Sep 13, 2016 11:19 AM UTC
**Owner:** Praveen


Changeset : 5.1 FC (7997 changeset)

 Extra notification is received for lock operation on unlocked SG.
 
 amf-adm lock safSg=AmfDemo,safApp=AmfDemo
===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_UNLOCKED
New State: SA_AMF_ADMIN_LOCKED

===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_LOCKED
New State: SA_AMF_ADMIN_LOCKED



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2021 AMF : active compname is improperly populated in Standby callback (NPM)

2016-09-15 Thread Praveen
- **status**: assigned --> accepted
- **Part**: lib --> nd
- **Milestone**: 4.7.2 --> 5.1.RC2
- **Comment**:

Applicable to other red models also. After setting activeCompName in standby 
descriptor, AMFND is wronlgly calling osaf_extended_name_clear() for active 
descriptor. Since SaAmfCSIStateDescriptorT is a union, it clears the filled 
value.
But besides the reported problem there are others also:
-AMFND is copying activecompNmae using osaf_extended_name_alloc() 
activecompName in agent message, only when it is an extended name. This should 
be done irrespective of short or long dn.
-Agent should perform validation check on dn based on HA state of the CSI SET 
callback, as activecompname is not populated for all the HA states.
-When I try to fix above problem, I see one more problem: a standby component 
gets its own name in the standby callback. There is a minor regression at AMFD.

I will send out a pach that will fix all these problems.

.



---

** [tickets:#2021] AMF :  active compname is improperly populated in Standby 
callback (NPM)**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Sat Sep 10, 2016 06:52 AM UTC by Srikanth R
**Last Updated:** Wed Sep 14, 2016 08:51 AM UTC
**Owner:** Praveen


 For an application with NPM model, active compName in the standby descriptor 
is having corrupted value in the standby callback.


Breakpoint 1, pycbk_SaAmfCSISetCallbackT (invocation=4287627278, 
compName=0x941a28, haState=SA_AMF_HA_STANDBY, csiDescriptor=...) at 
saAmf_wrap.c:2914
2914saAmf_wrap.c: No such file or directory.
(gdb) p csiDescriptor 
$1 = {csiFlags = 1, csiName = {length = 48, value = 
"safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_Npm", '\000' }, csiStateDescriptor = {activeDescriptor = {transitionDescriptor = 
1634926660, 
  activeCompName = {length = 0, value = 
"\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm",
 '\000' }}, standbyDescriptor = {activeCompName = {
length = 68, value = 
"**sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm**",
 '\000' }, standbyRank = 0}}, csiAttr = {attr = 0x7642a0, 
number = 1}}


 In the above callback ( in gdb ), the  active component name in standby 
descriptor in standby callback  should be 
safComp=COMP1,safSu=TestApp_SU3,safSg=TestApp_SG1,safApp=TestApp_Npm, but it  
is populated with improper value :
 
sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npmapo


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2039 NTF: Missleading returned code of ntftest when tests are failed

2016-09-15 Thread Minh Hon Chau



---

** [tickets:#2039] NTF: Missleading returned code of ntftest when tests are 
failed**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 09:11 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 15, 2016 09:11 AM UTC
**Owner:** nobody


If running a test case of ntftest suite 37 (38, or 39) and it fails, the 
reported error is always SA_AIS_ERR_FAILED_OPERATION, but the actual failure 
code is different.
The actual error code should be memorized and reported in test result.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2014 Rebooted controller not detected in TCP

2016-09-15 Thread A V Mahesh (AVM)
Hi Jonas,

Ok , I just pushed , please test once on 4.7 :



branch:  opensaf-4.7.x
parent:  8043:4a8a00097561
user:A V Mahesh 
date:Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]



-AVM

On 9/15/2016 12:08 AM, Jonas Arndt wrote:
>
> Mahesh,
>
> Can we get this back-ported to 4.7.x as well?
>
> Cheers,
>
> // Jonas
>
> 
>
> *[tickets:#2014]  
> Rebooted controller not detected in TCP*
>
> *Status:* review
> *Milestone:* 5.0.1
> *Created:* Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
> *Last Updated:* Wed Sep 14, 2016 04:51 AM UTC
> *Owner:* A V Mahesh (AVM)
> *Attachments:*
>
>   * logs.tgz
> 
> (84.1 kB; application/x-compressed-tar)
>   * tcp_user_timeout_2014.patch
> 
> 
> (5.5 kB; application/octet-stream)
>
> OS environment:
>
> Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
> 4.4.7 kernel
> Network eth0, bonded, OVS (I have tried all of them and the problem is there 
> in all configurations)
>
> In 20% of the cases a "reboot -f" on controller2 is not detected and 
> acted on. What is in the mds.log is .
>
> Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: 
> Adest=<0x,1>
> Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: 
> Anchor=<0x0002020f,1790>
> Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
> Error occured
> Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
> on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
> Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: 
> Adest=<0x,1>
> Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: 
> Anchor=<0x0002020f,1790>
> Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
> Error occured
> Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
> on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
> Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: 
> Adest=<0x,1>
> Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: 
> Anchor=<0x0002020f,1790>
> Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
> Error occured
> Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
> on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
> Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: 
> Adest=<0x,1>
> Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: 
> Anchor=<0x0002020f,1790>
>
> Still, there is nothing in the syslog indicating that controller2 has 
> left the cluster. This is for TCP.
> When the node comes back on line (without opensaf being started) 
> controller 1 notice finally and fail over apps.
>
> When the reboot is not detected the tcp keep alives stops and goes 
> into retransmits instead. I have attached 2 tshark sessions captured 
> from controller1, capturing traffic between controller1 and 
> controller2. The failed reboot detect is captured in 
> "ctrl2_failed_detection.trc" and for a working detection there is a 
> file "ctrl2_working.trc" I have also attached all logs in 
> /var/log/opensaf and the syslog (all from controller one).
>
> It appears to me that we are hitting something similar like 
> "http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;
>
> // Jonas
>
> 
>
> Sent from sourceforge.net because 
> opensaf-tickets@lists.sourceforge.net is subscribed to 
> https://sourceforge.net/p/opensaf/tickets/
>
> To unsubscribe from further messages, a project admin can change 
> settings at https://sourceforge.net/p/opensaf/admin/tickets/options. 
> Or, if this is a mailing list, you can unsubscribe from the mailing list.
>
>
>
> --
>
>
> ___
> Opensaf-tickets mailing list
> Opensaf-tickets@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets




---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Thu Sep 15, 2016 05:59 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 

[tickets] [opensaf:tickets] #2038 NTF: Using sizeof(string) for lengthAdditionalText results in SA_AIS_ERR_INVALID_PARAM

2016-09-15 Thread Minh Hon Chau



---

** [tickets:#2038] NTF: Using sizeof(string) for lengthAdditionalText results 
in SA_AIS_ERR_INVALID_PARAM**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Thu Sep 15, 2016 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 15, 2016 08:47 AM UTC
**Owner:** nobody


After patch of #2006, now if ntf client uses sizeof(string) to calculate the 
string length for lengthAdditionalText, that will result in 
SA_AIS_ERR_INVALID_PARAM when sending notification.

Some existing ntftests have failed because those tests are still using sizeof, 
which could be fixed in ntftests. However, that could be a *complaint* from 
real applications which have been running without problem (application's code 
are using sizeof())


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-15 Thread Neelakanta Reddy
- **status**: unassigned --> assigned
- **assigned_to**: Neelakanta Reddy
- **Part**: - --> d



---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 06:52 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Sep 22 02:13:07 

[tickets] [opensaf:tickets] #1925 imm: 2PBE preload information is not sent from standby

2016-09-15 Thread Neelakanta Reddy
changeset:   8079:0576f5ca831e
branch:  opensaf-5.0.x
parent:  8074:8af3073a5dad
user:Neelakanta Reddy
date:Thu Sep 15 12:42:19 2016 +0530
summary: imm : corrected the cppcheck error [#1925]

changeset:   8080:119b72d81883
branch:  opensaf-5.1.x
parent:  8077:7d84f0a138ad
user:Neelakanta Reddy
date:Thu Sep 15 12:42:19 2016 +0530
summary: imm : corrected the cppcheck error [#1925]

changeset:   8081:e8ddfcd67a2f
tag: tip
parent:  8078:2d6f7704ad53
user:Neelakanta Reddy
date:Thu Sep 15 12:42:19 2016 +0530
summary: imm : corrected the cppcheck error [#1925]



---

** [tickets:#1925] imm: 2PBE preload information is not sent from standby**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Thu Jul 21, 2016 04:47 AM UTC by Neelakanta Reddy
**Last Updated:** Tue Aug 16, 2016 09:47 AM UTC
**Owner:** Neelakanta Reddy


In a 2PBE enabled cluster, if both the controllers are started simultaneously, 
then the IMMNDs in both the controllers will send pre-load information to 
active IMMD. Based on pr-load information the active IMMD descides form which 
IMMND the loading should start. with #79 the assignment of standby role is 
delayed until amfd initialization. The standby IMMND is not considered as a 
controller IMMND and the preload information is not sent to standby IMMND.

solution:
Send the preload information to all IMMNDs. If the IMMND is on controller then 
send the pre-load information.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2034 imm: IMMsv README changes fro 5.1

2016-09-15 Thread Neelakanta Reddy
- **status**: review --> fixed
- **Comment**:

changeset:   8077:7d84f0a138ad
branch:  opensaf-5.1.x
parent:  8075:b93e039d2cb2
user:Neelakanta Reddy
date:Thu Sep 15 12:29:22 2016 +0530
summary: imm:updated README for 5.1 [#2034]

changeset:   8078:2d6f7704ad53
tag: tip
parent:  8076:03aa9b77c634
user:Neelakanta Reddy
date:Thu Sep 15 12:29:22 2016 +0530
summary: imm:updated README for 5.1 [#2034]




---

** [tickets:#2034] imm: IMMsv README changes fro 5.1**

**Status:** fixed
**Milestone:** 5.1.RC2
**Created:** Wed Sep 14, 2016 08:35 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 08:46 AM UTC
**Owner:** Neelakanta Reddy


This Ticket is to update IMM README for 5.1 IMM Enhancements


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-15 Thread Madhurika Koppula
logs are attcached 


Attachments:

- 
[immd_assert.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98733abe/7f1c/attachment/immd_assert.tgz)
 (18.7 MB; application/octet-stream)


---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 06:51 AM UTC
**Owner:** nobody
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 

[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-15 Thread Madhurika Koppula



---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 06:51 AM UTC
**Owner:** nobody
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Sep 22 02:13:07 SCALE_SLOT-42 osafrded[1754]: Started
Sep 22 02:13:08 SCALE_SLOT-42 osaffmd[1763]: Started


Below is the 

[tickets] [opensaf:tickets] #2031 imm:README files are missing when opensaf is downloaded

2016-09-15 Thread Neelakanta Reddy
- **status**: review --> fixed
- **Comment**:

changeset:   8074:8af3073a5dad
branch:  opensaf-5.0.x
parent:  8071:ef0846e8e5c9
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]

changeset:   8075:b93e039d2cb2
branch:  opensaf-5.1.x
parent:  8072:c472ada0394c
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]

changeset:   8076:03aa9b77c634
tag: tip
parent:  8073:b7ba90304dce
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]




---

** [tickets:#2031] imm:README files are missing when opensaf is downloaded**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Wed Sep 14, 2016 02:40 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 08:59 AM UTC
**Owner:** Neelakanta Reddy


Update the Makefile.am with all README files


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1816 IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when ERR_LIBRARY was expected

2016-09-15 Thread Neelakanta Reddy
- **status**: review --> fixed
- **Comment**:

changeset:   8073:b7ba90304dce
tag: tip
parent:  8069:b30d5e33e50c
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8072:c472ada0394c
branch:  opensaf-5.1.x
parent:  8068:87a09d9164d3
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8071:ef0846e8e5c9
branch:  opensaf-5.0.x
parent:  8067:efeaffca9483
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8070:af5ecf3d1a72
branch:  opensaf-4.7.x
parent:  8066:afddc603adcb
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]




---

** [tickets:#1816] IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when 
ERR_LIBRARY was expected**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Mon May 09, 2016 07:27 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 11:47 AM UTC
**Owner:** Neelakanta Reddy


This was found as part of validating ticket #1808

Code snippet: imma_oi_api.c:3749

~~~
 if(immsv_om_handle_initialize) {/*This is always the first immsv_om_ call */
rc = immsv_om_handle_initialize(, 
);
} else {
TRACE("ERR_LIBRARY: Error in library linkage. 
libSaImmOm.so is not linked");
rc = SA_AIS_ERR_LIBRARY;
}

if(rc != SA_AIS_OK) {
TRACE("ERR_TRY_AGAIN: failed to obtain internal om 
handle rc:%u", rc);
rc = SA_AIS_ERR_TRY_AGAIN;
goto lock_fail; /* We are not locked and nothing to 
de-allocate.  */
}
~~~

When rc is set to SA_AIS_ERR_LIBRARY, there is no goto and hence next if 
condition is executed which sets rc SA_AIS_ERR_TRY_AGAIN



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2036 build : make rpm fails, if installation directories are specified

2016-09-15 Thread Srikanth R



---

** [tickets:#2036] build : make rpm fails, if installation directories are 
specified**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:03 AM UTC by Srikanth R
**Last Updated:** Thu Sep 15, 2016 06:03 AM UTC
**Owner:** nobody


Environment : 
Setup : SLES 64bit gcc 6.1

Steps performed :

Ran the following commands after downloading the opensaf from hg.
-> ./bootstrap.sh
-> ./configure CFLAGS="-g " CXXFLAGS="-g " --enable-tipc --enable-imm-pbe 
--enable-ntf-imcn   --sysconfdir=/opt/etc  --localstatedir=/opt/var 
--libdir=/opt/usr/lib
-> make rpm

 The last step fails with the following error.
 
 
 Checking for unpackaged file(s): /usr/lib/rpm/check-files 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root
error: Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf


RPM build errors:
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/lib/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/log/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/run/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/chassis_id
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/slot_id
.
File not found by glob: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/libSa*.a
Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf
  ...
 /opt/usr/lib/pkgconfig/opensaf-smf.pc
   /opt/usr/lib/pkgconfig/opensaf.pc
make: *** [rpm] Error 1






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


Re: [tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-15 Thread A V Mahesh

Hi Jonas,

Ok , I just pushed , please test once on 4.7 :



branch:  opensaf-4.7.x
parent:  8043:4a8a00097561
user:A V Mahesh 
date:Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]



-AVM

On 9/15/2016 12:08 AM, Jonas Arndt wrote:


Mahesh,

Can we get this back-ported to 4.7.x as well?

Cheers,

// Jonas



*[tickets:#2014]  
Rebooted controller not detected in TCP*


*Status:* review
*Milestone:* 5.0.1
*Created:* Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
*Last Updated:* Wed Sep 14, 2016 04:51 AM UTC
*Owner:* A V Mahesh (AVM)
*Attachments:*

  * logs.tgz

(84.1 kB; application/x-compressed-tar)
  * tcp_user_timeout_2014.patch


(5.5 kB; application/octet-stream)

OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is there in 
all configurations)

In 20% of the cases a "reboot -f" on controller2 is not detected and 
acted on. What is in the mds.log is .


Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>


Still, there is nothing in the syslog indicating that controller2 has 
left the cluster. This is for TCP.
When the node comes back on line (without opensaf being started) 
controller 1 notice finally and fail over apps.


When the reboot is not detected the tcp keep alives stops and goes 
into retransmits instead. I have attached 2 tshark sessions captured 
from controller1, capturing traffic between controller1 and 
controller2. The failed reboot detect is captured in 
"ctrl2_failed_detection.trc" and for a working detection there is a 
file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).


It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;


// Jonas



Sent from sourceforge.net because 
opensaf-tickets@lists.sourceforge.net is subscribed to 
https://sourceforge.net/p/opensaf/tickets/


To unsubscribe from further messages, a project admin can change 
settings at https://sourceforge.net/p/opensaf/admin/tickets/options. 
Or, if this is a mailing list, you can unsubscribe from the mailing list.




--


___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-15 Thread A V Mahesh (AVM)
- **status**: review --> fixed
- **Milestone**: 5.0.1 --> 4.7.2
- **Comment**:

changeset:   8066:afddc603adcb
branch:  opensaf-4.7.x
parent:  8043:4a8a00097561
user:A V Mahesh 
date:Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8067:efeaffca9483
branch:  opensaf-5.0.x
parent:  8049:28129451fd38
user:A V Mahesh 
date:Thu Sep 15 10:52:03 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8068:87a09d9164d3
branch:  opensaf-5.1.x
parent:  8065:019e617955ef
user:A V Mahesh 
date:Thu Sep 15 10:52:32 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8069:b30d5e33e50c
tag: tip
parent:  8064:99410ba8cc21
user:A V Mahesh 
date:Thu Sep 15 10:52:49 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Wed Sep 14, 2016 06:38 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets