[tickets] [opensaf:tickets] #2474 mds : clear mds lib valgrind warning

2017-06-05 Thread A V Mahesh (AVM) via Opensaf-tickets
- **status**: review --> fixed
- **Comment**:

To ssh://avmah...@git.code.sf.net/p/opensaf/code
   c0b25cd..1592f85  develop -> develop
   

To ssh://avmah...@git.code.sf.net/p/opensaf/code
   cbfaa7c..9e5e7af  release -> release



---

** [tickets:#2474] mds : clear mds lib valgrind  warning **

**Status:** fixed
**Milestone:** 5.17.08
**Created:** Thu Jun 01, 2017 04:59 AM UTC by A V Mahesh (AVM)
**Last Updated:** Thu Jun 01, 2017 09:05 AM UTC
**Owner:** A V Mahesh (AVM)


Valgrind 
---

==8184== Thread 3:
==8184== Conditional jump or move depends on uninitialised value(s)
==8184==at 0x58DC459: get_subtn_adest_details (mds_c_db.c:155)
==8184==by 0x58D9F10: mds_mcm_svc_up (mds_c_api.c:1967)
==8184==by 0x58F17B7: mdtm_process_discovery_events (mds_dt_tipc.c:1297)
==8184==by 0x58F27A8: mdtm_process_recv_events (mds_dt_tipc.c:806)
==8184==by 0x62C27B5: start_thread (in /lib64/libpthread-2.11.3.so)
==8184==by 0x65AF9CC: clone (in /lib64/libc-2.11.3.so)
==8184==
==8184== Use of uninitialised value of size 8
==8184==at 0x651AC73: _itoa_word (in /lib64/libc-2.11.3.so)
==8184==by 0x651DD36: vfprintf (in /lib64/libc-2.11.3.so)
==8184==by 0x65C3608: __vsnprintf_chk (in /lib64/libc-2.11.3.so)
==8184==by 0x65C354A: __snprintf_chk (in /lib64/libc-2.11.3.so)
==8184==by 0x58DC499: snprintf (stdio2.h:65)
==8184==by 0x58DC499: get_subtn_adest_details (mds_c_db.c:199)
==8184==by 0x58D9F10: mds_mcm_svc_up (mds_c_api.c:1967)
==8184==by 0x58F17B7: mdtm_process_discovery_events (mds_dt_tipc.c:1297)
==8184==by 0x58F27A8: mdtm_process_recv_events (mds_dt_tipc.c:806)
==8184==by 0x62C27B5: start_thread (in /lib64/libpthread-2.11.3.so)
==8184==by 0x65AF9CC: clone (in /lib64/libc-2.11.3.so)
---


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2484 imm: Testsuit 7 of immoitest fails with ERR_TRY_AGAIN

2017-06-05 Thread Hung Nguyen via Opensaf-tickets



---

** [tickets:#2484] imm: Testsuit 7 of immoitest fails with ERR_TRY_AGAIN**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Tue Jun 06, 2017 04:25 AM UTC by Hung Nguyen
**Last Updated:** Tue Jun 06, 2017 04:25 AM UTC
**Owner:** Hung Nguyen


~~~
# immoitest --longDn 7

Suite 7: Long DN
1  PASSED   SA_AIS_OK - Object create callback;
2  PASSED   SA_AIS_OK - Object modify callback;
3  PASSED   SA_AIS_OK - Object delete callback;
4  PASSED   SA_AIS_OK - Rt Object create and delete;
error: in src/imm/apitest/implementer/test_saImmOiLongDn.c at 500: 
SA_AIS_ERR_TRY_AGAIN (6), expected SA_AIS_OK (1) - exiting
~~~

The testcases of testsuite 7 use the same implementer name.
At the end of each testcase, the implementer name is not explicitly cleared 
with saImmOiImplementerClear().
Instead, saImmOiFinalize() is called.
The difference between OiImplementerClear() and OiFinalize() is:
\- OiImplementerClear() returns after the implementer is fully discarded on all 
nodes
\- saImmOiFinalize() returns after the implementer is locally discarded on the 
originating node (no guarantee of being fully discarded)

So if you set the same implementer name after saImmOiFinalize(), chances are 
you will get ERR_TRY_AGAIN.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2432 dtm: Node reboot because transportd reads invalid pid of dtmd

2017-06-05 Thread Minh Hon Chau via Opensaf-tickets
- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 
- **Blocker**:  --> False
- **Comment**:

release: [cbfaa7cb69aa32c05abd9bb6177410d1af1cd45c]
develop: [c0b25cd7a1a94d386d813735c72d22abded8583b]



---

** [tickets:#2432] dtm: Node reboot because transportd reads invalid pid of 
dtmd**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Wed Apr 19, 2017 12:02 AM UTC by Minh Hon Chau
**Last Updated:** Thu Apr 20, 2017 06:25 AM UTC
**Owner:** nobody


There's an unexpected node reboot during Opensaf node startup

2017-01-24 18:19:53 SC-1 opensafd: Starting OpenSAF Services(5.2.M0 - 
8532:b6df9e2a2b8b:default) (Using TCP)
2017-01-24 18:19:53 SC-1 osaftransportd[398]: Started
2017-01-24 18:19:53 SC-1 osafdtmd[393]: mkfifo already exists: 
/var/lib/opensaf/osafdtmd.fifo File exists
2017-01-24 18:19:53 SC-1 osaftransportd[398]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: osafdtmd failed to start, OwnNodeId = 0, 
SupervisionTime = 60
2017-01-24 18:19:53 SC-1 osafdtmd[393]: Started

Another attempt to reproduce this problem by adding more debug log:

2017-04-18 18:01:14 SC-1 opensafd: Starting OpenSAF Services(5.2.0 - 
0:) (Using TCP)
2017-04-18 18:01:14 SC-1 osaftransportd[380]: fifo_file 
/var/lib/opensaf/osaftransportd.fifo
2017-04-18 18:01:14 SC-1 osaftransportd[380]: mkfifo already exists: 
/var/lib/opensaf/osaftransportd.fifo File exists
2017-04-18 18:01:14 SC-1 osafdtmd[386]: fifo_file /var/lib/opensaf/osafdtmd.fifo
2017-04-18 18:01:14 SC-1 osafdtmd[386]: mkfifo already exists: 
/var/lib/opensaf/osafdtmd.fifo File exists
2017-04-18 18:01:15 SC-1 osaftransportd[380]: __pidfile 
/var/run/opensaf/osaftransportd.pid
2017-04-18 18:01:15 SC-1 osaftransportd[380]: Started
2017-04-18 18:01:15 SC-1 osaftransportd[380]: WA file_path_:/var/run/opensaf, 
file_name_:osafdtmd.pid
2017-04-18 18:01:15 SC-1 osafdtmd[386]: __pidfile /var/run/opensaf/osafdtmd.pid
2017-04-18 18:01:15 SC-1 osaftransportd[380]: WA file name: osafdtmd.pid created
2017-04-18 18:01:15 SC-1 osaftransportd[380]: WA rdstate 6, pid: 4294967295
2017-04-18 18:01:15 SC-1 osaftransportd[380]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: osafdtmd failed to start, OwnNodeId = 0, 
SupervisionTime = 60
2017-04-18 18:01:15 SC-1 osafdtmd[386]: Started

It could be because osaftransportd fails to read pid in osafdmtd.pid



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2483 plm: allow dynamic creation of EE when EE is parent

2017-06-05 Thread Alex Jones



---

** [tickets:#2483] plm: allow dynamic creation of EE when EE is parent**

**Status:** assigned
**Milestone:** 5.17.08
**Created:** Mon Jun 05, 2017 08:41 PM UTC by Alex Jones
**Last Updated:** Mon Jun 05, 2017 08:41 PM UTC
**Owner:** Alex Jones


PLM currently disallows the dynamic creation of an EE when its parent is also 
an EE. This restriction breaks scaling out EEs that have an EE as a parent 
(virtual machine with the host EE as the parent).


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2482 plm: don't reset virtual machine if it is ourself

2017-06-05 Thread Alex Jones



---

** [tickets:#2482] plm: don't reset virtual machine if it is ourself**

**Status:** assigned
**Milestone:** 5.17.08
**Created:** Mon Jun 05, 2017 06:39 PM UTC by Alex Jones
**Last Updated:** Mon Jun 05, 2017 06:39 PM UTC
**Owner:** Alex Jones


When instantiating an EE, don't reset the virtual machine if it is ourself 
since we are clearly already running. This can happen if the controller is 
running in a virtual machine, instead of on the host.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2476 amfnd: SURestart recovery leaves SU DISABLED/OUT_OF_SERVICE

2017-06-05 Thread Minh Hon Chau
- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 



---

** [tickets:#2476] amfnd: SURestart recovery leaves SU DISABLED/OUT_OF_SERVICE**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Fri Jun 02, 2017 12:05 AM UTC by Minh Hon Chau
**Last Updated:** Fri Jun 02, 2017 12:08 PM UTC
**Owner:** nobody


- Start 2N amf_demo application with configuration of SuRestart recovery.
- Kill amf_demo to escalate to SURestart recovery
- Recovery is done successfully
- Checking SU states, SU is still DISABLED, OUT_OF_SERVICE, but the SU has 
assignments

syslog:
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO 
'safComp=AmfDemo2,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' faulted due 
to 'avaDown' : Recovery is 'suRestart'
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATED => 
TERMINATING
2017-06-01 23:34:12 PL-4 amf_demo_script: sleep 5s at stop cmd
2017-06-01 23:34:12 PL-4 amf_demo_script: sleep 5s at stop cmd
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State TERMINATING => 
UNINSTANTIATED
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State UNINSTANTIATED 
=> INSTANTIATING
2017-06-01 23:34:12 PL-4 amf_demo_script: 
safComp=AmfDemo2,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
2017-06-01 23:34:12 PL-4 amf_demo[366]: 
'safComp=AmfDemo2,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started
2017-06-01 23:34:12 PL-4 amf_demo_script: sleep 5s at start cmd
2017-06-01 23:34:12 PL-4 amf_demo[366]: before saAmfComponentRegister 
[safComp=AmfDemo2,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon]
2017-06-01 23:34:12 PL-4 amf_demo[366]: after saAmfComponentRegister
2017-06-01 23:34:12 PL-4 amf_demo[366]: Registered with AMF and HC started
2017-06-01 23:34:12 PL-4 amf_demo[366]: Health check 1
2017-06-01 23:34:12 PL-4 amf_demo_script: 
safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
2017-06-01 23:34:12 PL-4 amf_demo[380]: 
'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started
2017-06-01 23:34:12 PL-4 amf_demo_script: sleep 5s at start cmd
2017-06-01 23:34:12 PL-4 amf_demo[380]: before saAmfComponentRegister 
[safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon]
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATING 
=> INSTANTIATED
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigning 
'safSi=AmfDemoTwon,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigning 
'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigning 
'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 amf_demo[380]: after saAmfComponentRegister
2017-06-01 23:34:12 PL-4 amf_demo[380]: Registered with AMF and HC started
2017-06-01 23:34:12 PL-4 amf_demo[380]: CSI Set - add 
'safCsi=AmfDemoTwon,safSi=AmfDemoTwon,safApp=AmfDemoTwon' HAState Active
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigned 
'safSi=AmfDemoTwon,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 amf_demo[380]: CSI Set - add 
'safCsi=AmfDemoTwonDep1,safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon' HAState Active
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigned 
'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 amf_demo[380]: CSI Set - add 
'safCsi=AmfDemoTwonDep2,safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon' HAState Active
2017-06-01 23:34:12 PL-4 osafamfnd[186]: NO Assigned 
'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon' ACTIVE to 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon'
2017-06-01 23:34:12 PL-4 amf_demo[380]: Health check 1
2017-06-01 23:34:22 PL-4 osafamfnd[186]: NO 
'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Component or SU restart 
probation timer expired

amf-state su:

safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU5B,safSg=AmfDemoTwon,safApp=AmfDemoTwon
saAmfSUAdminState=LOCKED-INSTANTIATION(3)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from 

[tickets] [opensaf:tickets] #2467 amfd: Unnesssary log warning when recreates SUSI after SC Absence stage

2017-06-05 Thread Minh Hon Chau
- **status**: fixed --> review
- **assigned_to**: Minh Hon Chau



---

** [tickets:#2467] amfd: Unnesssary log warning when recreates SUSI after SC 
Absence stage**

**Status:** review
**Milestone:** 5.17.06
**Created:** Thu May 25, 2017 02:33 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jun 05, 2017 10:12 AM UTC
**Owner:** Minh Hon Chau


When AMFD starts from SC Absence period, AMFD will recreate SUSI which are sent 
from all PLs as sync info. AMFD has to recreates those SUSI to IMM because 
those SUSI might be missed in IMM just before SC went down. In most of the 
cases, those SUSI should be present in IMM, and AMFD will get error 14 
(EXISTED) when recreate those SUSI.
AMFD should ignore and not to log this warning

2017-05-25 12:27:06 SC-1 osafamfnd[274]: NO Sending node up due to NCSMDS_UP
2017-05-25 12:27:06 SC-1 osafamfd[259]: NO Received node_up from 2010f: msg_id 1
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed4,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwon,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed3,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwon,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed5,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Enter restore headless cached RTAs 
from IMM
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Leave reading headless cached RTAs 
from IMM: SUCCESS
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Node 'SC-1' joined the cluster
2017-05-25 12:27:07 SC-1 osafamfnd[274]: NO 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2477 amfd: Cyclic reboot after SC absence period (in large cluster)

2017-06-05 Thread Minh Hon Chau
Hi Praveen,

Steps:

amf-adm unlock safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon
amf-adm unlock safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
amf-adm unlock safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon
amf-adm unlock safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon
amf-adm unlock safSu=SU5,safSg=AmfDemoTwon,safApp=AmfDemoTwon

echo 1 > /root/2477
stop SC1
stop SC2
start SC1
start SC2
echo 0 > /root/2477

Note that I stop SC abruptly by killing the container of SC

Thanks,
Minh


---

** [tickets:#2477] amfd: Cyclic reboot after SC absence period (in large 
cluster)**

**Status:** review
**Milestone:** 5.17.06
**Labels:** assignment failover during stop of both SC 2416 
**Created:** Fri Jun 02, 2017 06:17 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jun 05, 2017 09:36 AM UTC
**Owner:** Minh Hon Chau


The scenario of the problem in this ticket happens in the same scenario 
reported in #2416

After SC absence period, amfd gets into osafassert(), causes coredump, and the 
problem repeatedly happens 

One of patches of #2416 had tried to call IMM sync as soon as possible, and it 
works fine with a small cluster (5 nodes). But a large cluster consists of 
about 75 nodes, the change of IMM sync calls takes mostly no effect. 

In #2416, a problem had been seen with an assumption of unreliable IMM sync 
calls in which after SC absence period, amfd had 3 assignments for a 2N SG, 2 
STANDBY SUSIs , and 1 ACTIVE SUSI. It was fixed by commit :"amfd: Add iteration 
to failover all absent assignments [#2416]" (refer to: 
https://sourceforge.net/p/opensaf/tickets/2416/#f83b)

Another variant problem of unreliable IMM calls before both SC go down, is that 
amfd can have both SUs with ACTIVE assignments, that leads to assert. This 
problem can only be seen in large cluster so far


Details of coredump:
 
~~~
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafamfd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install 
opensaf-amf-director-debuginfo-5.2.0-469.0.6128a2d.sle12.x86_64
(gdb) bt full
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f784279c478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f78435fdf4e in __osafassert_fail (__file=, 
__line=, __func=, 
__assertion=) at ../../opensaf/src/base/sysf_def.c:286
No locals.
#3  0x7f78445671e8 in avd_sg_2n_act_susi (sg=, 
stby_susi=stby_susi@entry=0x7ffeef034998, cb=0x7f78447f2e80 <_control_block>)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:596
susi = 
a_susi_2 = 0x7f7845e0d0c0
s_susi_1 = 0x7f7845e0d0c0
su_2 = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
s_susi_2 = 0x7f7845e2a030
a_susi = 0x0
a_susi_1 = 0x7f7845e2a030
s_susi = 0x0
su_1 = 0x7f7845d69e60
#4  0x7f784456d5d6 in SG_2N::node_fail (this=0x7f7845d5f4f0, 
cb=0x7f78447f2e80 <_control_block>, su=0x7f7845d69e60)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:3402
a_susi = 
s_susi = 0x7f7845d69a68
o_su = 
flag = 
__FUNCTION__ = "node_fail"
su_ha_state = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
#5  0x7f784455de1a in AVD_SG::failover_absent_assignment 
(this=0x7f7845d5f4f0) at ../../opensaf/src/amf/amfd/sg.cc:2307
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "failover_absent_assignment"
failed_su = 0x7f7845d69e60
#6  0x7f7844514125 in avd_cluster_tmr_init_evh (cb=0x7f78447f2e80 
<_control_block>, evt=)
at ../../opensaf/src/amf/amfd/cluster.cc:103
i_sg = 0x7f7845d5f4f0
__for_range = @0x7f7845ca2a90: {db = {_M_t = {
  _M_impl = 
{ const, AVD_SG*> > >> = 
{<__gnu_cxx::new_allocator const, AVD_SG*> > >> = {}, }, 
_M_key_compare = {, std::basic_string, bool>> = {}, 
}, _M_header = {_M_color = std::_S_red, 
  _M_parent = 0x7f7845d515e0, _M_left = 0x7f7845d03ed0, 
_M_right = 0x7f7845d81580}, _M_node_count = 28
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 
#7  0x7f784453ca2c in process_event (cb_now=0x7f78447f2e80 
<_control_block>, evt=0x7f78340013d0) at ../../opensaf/src/amf/amfd/main.cc:775
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "process_event"
#8  0x7f78444f6abe in main_loop () at 

[tickets] [opensaf:tickets] #2467 amfd: Unnesssary log warning when recreates SUSI after SC Absence stage

2017-06-05 Thread Minh Hon Chau
- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 



---

** [tickets:#2467] amfd: Unnesssary log warning when recreates SUSI after SC 
Absence stage**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Thu May 25, 2017 02:33 AM UTC by Minh Hon Chau
**Last Updated:** Thu May 25, 2017 11:21 AM UTC
**Owner:** nobody


When AMFD starts from SC Absence period, AMFD will recreate SUSI which are sent 
from all PLs as sync info. AMFD has to recreates those SUSI to IMM because 
those SUSI might be missed in IMM just before SC went down. In most of the 
cases, those SUSI should be present in IMM, and AMFD will get error 14 
(EXISTED) when recreate those SUSI.
AMFD should ignore and not to log this warning

2017-05-25 12:27:06 SC-1 osafamfnd[274]: NO Sending node up due to NCSMDS_UP
2017-05-25 12:27:06 SC-1 osafamfd[259]: NO Received node_up from 2010f: msg_id 1
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed4,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwon,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed3,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', 
parentName:'safSi=AmfDemoTwon,safApp=AmfDemoTwon', failed with 14
2017-05-25 12:27:06 SC-1 osafamfd[259]: WA saImmOiRtObjectCreate_2 of 
className:'SaAmfSIAssignment', parentName:'safSi=NoRed5,safApp=OpenSAF', failed 
with 14
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Enter restore headless cached RTAs 
from IMM
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Leave reading headless cached RTAs 
from IMM: SUCCESS
2017-05-25 12:27:07 SC-1 osafamfd[259]: NO Node 'SC-1' joined the cluster
2017-05-25 12:27:07 SC-1 osafamfnd[274]: NO 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2477 amfd: Cyclic reboot after SC absence period (in large cluster)

2017-06-05 Thread Praveen
Hi Minh,

What are the steps to reproduce after applying the patch 2477_rep.diff?


Thanks,
Praveen


---

** [tickets:#2477] amfd: Cyclic reboot after SC absence period (in large 
cluster)**

**Status:** review
**Milestone:** 5.17.06
**Labels:** assignment failover during stop of both SC 2416 
**Created:** Fri Jun 02, 2017 06:17 AM UTC by Minh Hon Chau
**Last Updated:** Fri Jun 02, 2017 09:25 AM UTC
**Owner:** Minh Hon Chau


The scenario of the problem in this ticket happens in the same scenario 
reported in #2416

After SC absence period, amfd gets into osafassert(), causes coredump, and the 
problem repeatedly happens 

One of patches of #2416 had tried to call IMM sync as soon as possible, and it 
works fine with a small cluster (5 nodes). But a large cluster consists of 
about 75 nodes, the change of IMM sync calls takes mostly no effect. 

In #2416, a problem had been seen with an assumption of unreliable IMM sync 
calls in which after SC absence period, amfd had 3 assignments for a 2N SG, 2 
STANDBY SUSIs , and 1 ACTIVE SUSI. It was fixed by commit :"amfd: Add iteration 
to failover all absent assignments [#2416]" (refer to: 
https://sourceforge.net/p/opensaf/tickets/2416/#f83b)

Another variant problem of unreliable IMM calls before both SC go down, is that 
amfd can have both SUs with ACTIVE assignments, that leads to assert. This 
problem can only be seen in large cluster so far


Details of coredump:
 
~~~
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib64/opensaf/osafamfd'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install 
opensaf-amf-director-debuginfo-5.2.0-469.0.6128a2d.sle12.x86_64
(gdb) bt full
#0  0x7f784279b0c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f784279c478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f78435fdf4e in __osafassert_fail (__file=, 
__line=, __func=, 
__assertion=) at ../../opensaf/src/base/sysf_def.c:286
No locals.
#3  0x7f78445671e8 in avd_sg_2n_act_susi (sg=, 
stby_susi=stby_susi@entry=0x7ffeef034998, cb=0x7f78447f2e80 <_control_block>)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:596
susi = 
a_susi_2 = 0x7f7845e0d0c0
s_susi_1 = 0x7f7845e0d0c0
su_2 = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
s_susi_2 = 0x7f7845e2a030
a_susi = 0x0
a_susi_1 = 0x7f7845e2a030
s_susi = 0x0
su_1 = 0x7f7845d69e60
#4  0x7f784456d5d6 in SG_2N::node_fail (this=0x7f7845d5f4f0, 
cb=0x7f78447f2e80 <_control_block>, su=0x7f7845d69e60)
at ../../opensaf/src/amf/amfd/sg_2n_fsm.cc:3402
a_susi = 
s_susi = 0x7f7845d69a68
o_su = 
flag = 
__FUNCTION__ = "node_fail"
su_ha_state = 
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
#5  0x7f784455de1a in AVD_SG::failover_absent_assignment 
(this=0x7f7845d5f4f0) at ../../opensaf/src/amf/amfd/sg.cc:2307
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "failover_absent_assignment"
failed_su = 0x7f7845d69e60
#6  0x7f7844514125 in avd_cluster_tmr_init_evh (cb=0x7f78447f2e80 
<_control_block>, evt=)
at ../../opensaf/src/amf/amfd/cluster.cc:103
i_sg = 0x7f7845d5f4f0
__for_range = @0x7f7845ca2a90: {db = {_M_t = {
  _M_impl = 
{ const, AVD_SG*> > >> = 
{<__gnu_cxx::new_allocator const, AVD_SG*> > >> = {}, }, 
_M_key_compare = {, std::basic_string, bool>> = {}, 
}, _M_header = {_M_color = std::_S_red, 
  _M_parent = 0x7f7845d515e0, _M_left = 0x7f7845d03ed0, 
_M_right = 0x7f7845d81580}, _M_node_count = 28
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 
#7  0x7f784453ca2c in process_event (cb_now=0x7f78447f2e80 
<_control_block>, evt=0x7f78340013d0) at ../../opensaf/src/amf/amfd/main.cc:775
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "process_event"
#8  0x7f78444f6abe in main_loop () at ../../opensaf/src/amf/amfd/main.cc:691
pollretval = 
evt = 0x7f78340013d0
polltmo = 0
term_fd = 24
cb = 0x7f78447f2e80 <_control_block>
error = 
old_sync_state = AVD_STBY_OUT_OF_SYNC
#9  main (argc=, argv=) at 
../../opensaf/src/amf/amfd/main.cc:848
No locals.
~~~



---

Sent from sourceforge.net because