[tickets] [opensaf:tickets] #2436 amfnd: Buffered messages are unexpectedly deleted during SC Absence period

2017-04-25 Thread Minh Hon Chau
- **status**: assigned --> review



---

** [tickets:#2436] amfnd: Buffered messages are unexpectedly deleted during SC 
Absence period**

**Status:** review
**Milestone:** 5.17.06
**Created:** Mon Apr 24, 2017 10:58 AM UTC by Minh Hon Chau
**Last Updated:** Mon Apr 24, 2017 10:58 AM UTC
**Owner:** Minh Hon Chau


Stop both SCs so that cluster goes into headless. Trigger a su failover, so 
su_oper message is buffered and supposedly will be sent to active amfd when SC 
comes back. However, if cluster is waiting up to 3 mins, which is exactly the 
MDS_AWAIT_ACTIVE_TMR_VAL timeout, amfnd will receive another NCSMDS_DOWN. At 
this time, amfnd will delete all pending messages, which causes the headless 
recovery impossible.

Some outline logs:
~~~
Apr 18 16:49:09.749428 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh 
Apr 18 16:49:09.750094 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director 
unexpectedly crashed
Apr 18 16:49:09.750103 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending 
messages to be sent to AMFD

Apr 18 16:49:09.796138 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() 
deferred as AMF director is offline(1), or sync is required(1)

Apr 18 16:49:09.797440 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() 
deferred as AMF director is offline(1), or sync is required(1)

Apr 18 16:52:09.825457 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh 
Apr 18 16:52:09.825489 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director 
unexpectedly crashed
Apr 18 16:52:09.825495 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending 
messages to be sent to AMFD
Apr 18 16:52:09.825498 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del 
Apr 18 16:52:09.825505 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del 
Apr 18 16:52:09.825508 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del 
Apr 18 16:52:09.825512 osafamfnd 
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del 
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2430 lck: resources can deadlock after master glnd restart

2017-04-25 Thread Alex Jones
- **status**: review --> fixed
- **Blocker**:  --> False
- **Comment**:

commit e7482750832c7f002730667e0521d41cac6ae77c
Author: Alex Jones 
Date:   Tue Apr 25 11:05:07 2017 -0400




---

** [tickets:#2430] lck: resources can deadlock after master glnd restart**

**Status:** fixed
**Milestone:** 5.17.08
**Created:** Mon Apr 17, 2017 07:57 PM UTC by Alex Jones
**Last Updated:** Tue Apr 18, 2017 03:07 PM UTC
**Owner:** Alex Jones


Exclusive locks can deadlock after master glnd restarts. In a 6 node cluster, 
with each node attempting to access one global exclusive lock, and holding it 
for 1 sec -- if the master glnd is killed, these locks can deadlock.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2438 log: generate hash only if having destination name set

2017-04-25 Thread Vu Minh Nguyen



---

** [tickets:#2438] log: generate hash only if having destination name set**

**Status:** accepted
**Milestone:** 5.17.06
**Created:** Tue Apr 25, 2017 02:12 PM UTC by Vu Minh Nguyen
**Last Updated:** Tue Apr 25, 2017 02:12 PM UTC
**Owner:** Vu Minh Nguyen


`rfc5424_msgid` is only referred when streaming and having destination name set 
on that log stream. It means it does meaningless job sometimes, that is do hash 
calculation even there is no destination name set. 

This ticket will fix that.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2419 smf: when fixing ticket #2145 a NBC problem was introduced

2017-04-25 Thread Rafael Odzakow
I consider the AMF objects as an interface and some external code outside of 
OpenSAF might be reading that campaignDN attribute.


---

** [tickets:#2419] smf: when fixing ticket #2145 a NBC problem was introduced**

**Status:** wontfix
**Milestone:** 5.2.0
**Created:** Mon Apr 10, 2017 11:11 AM UTC by elunlen
**Last Updated:** Mon Apr 24, 2017 07:53 PM UTC
**Owner:** nobody


Previous behavior:
The behavior was to ignore a fail to activate a component unless any secondary 
fault happened. This means that it was for example possible to complete a 
campaign even if a component failed to start and fix this problem after 
committing. No action to resume the campaign was needed.

After [#2145]:
The campaign will always suspend in case of component fail and a resume must be 
requested for the campaign to continue.

NBC:
The behavior has changed in such a way that it must be seen as a NBC. The #2145 
ticket corrects SMF behavior regarding AIS but is still NBC since the previous 
behavior is the legacy behavior in previous releases.

Proposal 1; Fix if not needed to change setting in runtime e.g. during an 
upgrade
Add a new configuration attribute to the SMF configuration class that makes it 
possible to select whether the behavior after #2145 shall be used or not. The 
default setting must be the previous behavior.
The setting must have the following properties:
- If the attribute does not exist (old model)   legacy behavior
- If the attribute value is not changed from defaultlegacy behavior
- If the attribute value is  or invalid  legacy behavior
- If the attribute value is a valid “ON” settingnew behavior
- A request to change the attribute in runtime shall always be rejected

Proposal 2; Fix if change has to be made during upgrade:
Add a new configuration attribute to the SMF configuration class that makes it 
possible to select whether the behavior after #2145 shall be used or not. The 
default setting must be the previous behavior.
The setting must have the following properties:
- If the attribute does not exist (old model)   legacy behavior
- If the attribute value is not changed from defaultlegacy behavior
- If the attribute value is  or invalid  legacy behavior
- If the attribute value is a valid “ON” settingnew behavior
- Attribute value must be possible to change in runtime in “idle” state (no 
campaign is executing)
- Attribute value must be possible to change in runtime in campaign init state. 
Note that if changed here
  the new setting must be used in the rest of the campaign



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2412 log: refactor handling log client database in log agent

2017-04-25 Thread Vu Minh Nguyen
Second increment just sent out for review.

Here is the link of code changes/added:
https://sourceforge.net/u/winhvu/review/ci/a7a081176b9b0a6730c7769f14d53d090300dd54/




---

** [tickets:#2412] log: refactor handling log client database in log agent**

**Status:** review
**Milestone:** 5.17.08
**Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong
**Last Updated:** Mon Apr 24, 2017 07:37 AM UTC
**Owner:** Vu Minh Nguyen


In log agent, there is a link list holding all log clients of an application 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.

Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.

So, this ticket intends to remove that concern by doing:
1) Centralizing read/write accesses to the database to one place with its 
private mutex
2) Use C++ containters to contain and handle databases

And will push the ticket in 02 increments:
1) Convert agent code to C++ without touching any existing logic (looks like 
what AMF has done it in [#1673])
2) Do #1 and #2 above


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD

2017-04-25 Thread Hung Nguyen
- **status**: review --> fixed



---

** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD**

**Status:** fixed
**Milestone:** 5.17.06
**Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen
**Last Updated:** Tue Apr 25, 2017 06:44 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) 
(149.4 kB; application/x-compressed)


When Standby IMMD is up at the same time with a IMMND exiting, the info of that 
IMMND might not be removed from **immnd_tree** of the Standby IMMD.

Details of the problem is explained in the sequence diagram below
[sequence 
diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA)

SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting

~~~
18:35:03 SC-1 osafimmnd[441]: exiting for shutdown

18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:568511936070075)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:567412424442298)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:566312912814523)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:565213401186744)

18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, 
dest:564113889558969)
~~~

Down event for IMMND@SC-1 was received on SC-5 but not on SC-2.


**The symptoms:**

1. If the down IMMND is the corrdinator, that results in when that Standby IMMD 
becomes Active, it fails to elect new coordinator as there's already a 
coordinator in the **immnd_tree**.
~~~
18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed 
=> electing new coord
~~~
No more logs about newly elected coordinator were printed out.


2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the 
IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch.

~~~
18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord
18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting
~~~




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD

2017-04-25 Thread Hung Nguyen
- **Blocker**:  --> False
- **Milestone**: 5.0.2 --> 5.17.06
- **Comment**:

5.17.08 (develop) [code:85c90b]
~~~
commit 85c90b4abead8bd66e1f20be3f84255645880597
Author: Hung Nguyen 
Date:   Tue Apr 25 13:24:29 2017 +0700

imm: Ignore the sync'ed IMMND nodes that are not up [#2418]
~~~

5.17.06 (release) [code:c1a37f]
~~~
commit c1a37fb5032c0e63165bc36e79d5a79be3fd19dd
Author: Hung Nguyen 
Date:   Tue Apr 25 13:24:29 2017 +0700

imm: Ignore the sync'ed IMMND nodes that are not up [#2418]
~~~

default (mercurial) [staging:dc6067]
~~~
changeset:   8777:dc60670bfd3b
user:Hung Nguyen 
date:Tue Apr 25 13:40:04 2017 +0700
summary: imm: Ignore the sync'ed IMMND nodes that are not up [#2418]

~~~



---

** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD**

**Status:** review
**Milestone:** 5.17.06
**Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen
**Last Updated:** Thu Apr 13, 2017 10:08 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) 
(149.4 kB; application/x-compressed)


When Standby IMMD is up at the same time with a IMMND exiting, the info of that 
IMMND might not be removed from **immnd_tree** of the Standby IMMD.

Details of the problem is explained in the sequence diagram below
[sequence 
diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA)

SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting

~~~
18:35:03 SC-1 osafimmnd[441]: exiting for shutdown

18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:568511936070075)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:567412424442298)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:566312912814523)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:565213401186744)

18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, 
dest:564113889558969)
~~~

Down event for IMMND@SC-1 was received on SC-5 but not on SC-2.


**The symptoms:**

1. If the down IMMND is the corrdinator, that results in when that Standby IMMD 
becomes Active, it fails to elect new coordinator as there's already a 
coordinator in the **immnd_tree**.
~~~
18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed 
=> electing new coord
~~~
No more logs about newly elected coordinator were printed out.


2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the 
IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch.

~~~
18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord
18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting
~~~




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets