[tickets] [opensaf:tickets] #2436 amfnd: Buffered messages are unexpectedly deleted during SC Absence period
- **status**: assigned --> review --- ** [tickets:#2436] amfnd: Buffered messages are unexpectedly deleted during SC Absence period** **Status:** review **Milestone:** 5.17.06 **Created:** Mon Apr 24, 2017 10:58 AM UTC by Minh Hon Chau **Last Updated:** Mon Apr 24, 2017 10:58 AM UTC **Owner:** Minh Hon Chau Stop both SCs so that cluster goes into headless. Trigger a su failover, so su_oper message is buffered and supposedly will be sent to active amfd when SC comes back. However, if cluster is waiting up to 3 mins, which is exactly the MDS_AWAIT_ACTIVE_TMR_VAL timeout, amfnd will receive another NCSMDS_DOWN. At this time, amfnd will delete all pending messages, which causes the headless recovery impossible. Some outline logs: ~~~ Apr 18 16:49:09.749428 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh Apr 18 16:49:09.750094 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director unexpectedly crashed Apr 18 16:49:09.750103 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending messages to be sent to AMFD Apr 18 16:49:09.796138 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() deferred as AMF director is offline(1), or sync is required(1) Apr 18 16:49:09.797440 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() deferred as AMF director is offline(1), or sync is required(1) Apr 18 16:52:09.825457 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh Apr 18 16:52:09.825489 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director unexpectedly crashed Apr 18 16:52:09.825495 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending messages to be sent to AMFD Apr 18 16:52:09.825498 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del Apr 18 16:52:09.825505 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del Apr 18 16:52:09.825508 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del Apr 18 16:52:09.825512 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2430 lck: resources can deadlock after master glnd restart
- **status**: review --> fixed - **Blocker**: --> False - **Comment**: commit e7482750832c7f002730667e0521d41cac6ae77c Author: Alex JonesDate: Tue Apr 25 11:05:07 2017 -0400 --- ** [tickets:#2430] lck: resources can deadlock after master glnd restart** **Status:** fixed **Milestone:** 5.17.08 **Created:** Mon Apr 17, 2017 07:57 PM UTC by Alex Jones **Last Updated:** Tue Apr 18, 2017 03:07 PM UTC **Owner:** Alex Jones Exclusive locks can deadlock after master glnd restarts. In a 6 node cluster, with each node attempting to access one global exclusive lock, and holding it for 1 sec -- if the master glnd is killed, these locks can deadlock. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2438 log: generate hash only if having destination name set
--- ** [tickets:#2438] log: generate hash only if having destination name set** **Status:** accepted **Milestone:** 5.17.06 **Created:** Tue Apr 25, 2017 02:12 PM UTC by Vu Minh Nguyen **Last Updated:** Tue Apr 25, 2017 02:12 PM UTC **Owner:** Vu Minh Nguyen `rfc5424_msgid` is only referred when streaming and having destination name set on that log stream. It means it does meaningless job sometimes, that is do hash calculation even there is no destination name set. This ticket will fix that. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #2419 smf: when fixing ticket #2145 a NBC problem was introduced
I consider the AMF objects as an interface and some external code outside of OpenSAF might be reading that campaignDN attribute. --- ** [tickets:#2419] smf: when fixing ticket #2145 a NBC problem was introduced** **Status:** wontfix **Milestone:** 5.2.0 **Created:** Mon Apr 10, 2017 11:11 AM UTC by elunlen **Last Updated:** Mon Apr 24, 2017 07:53 PM UTC **Owner:** nobody Previous behavior: The behavior was to ignore a fail to activate a component unless any secondary fault happened. This means that it was for example possible to complete a campaign even if a component failed to start and fix this problem after committing. No action to resume the campaign was needed. After [#2145]: The campaign will always suspend in case of component fail and a resume must be requested for the campaign to continue. NBC: The behavior has changed in such a way that it must be seen as a NBC. The #2145 ticket corrects SMF behavior regarding AIS but is still NBC since the previous behavior is the legacy behavior in previous releases. Proposal 1; Fix if not needed to change setting in runtime e.g. during an upgrade Add a new configuration attribute to the SMF configuration class that makes it possible to select whether the behavior after #2145 shall be used or not. The default setting must be the previous behavior. The setting must have the following properties: - If the attribute does not exist (old model) legacy behavior - If the attribute value is not changed from defaultlegacy behavior - If the attribute value is or invalid legacy behavior - If the attribute value is a valid “ON” settingnew behavior - A request to change the attribute in runtime shall always be rejected Proposal 2; Fix if change has to be made during upgrade: Add a new configuration attribute to the SMF configuration class that makes it possible to select whether the behavior after #2145 shall be used or not. The default setting must be the previous behavior. The setting must have the following properties: - If the attribute does not exist (old model) legacy behavior - If the attribute value is not changed from defaultlegacy behavior - If the attribute value is or invalid legacy behavior - If the attribute value is a valid “ON” settingnew behavior - Attribute value must be possible to change in runtime in “idle” state (no campaign is executing) - Attribute value must be possible to change in runtime in campaign init state. Note that if changed here the new setting must be used in the rest of the campaign --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2412 log: refactor handling log client database in log agent
Second increment just sent out for review. Here is the link of code changes/added: https://sourceforge.net/u/winhvu/review/ci/a7a081176b9b0a6730c7769f14d53d090300dd54/ --- ** [tickets:#2412] log: refactor handling log client database in log agent** **Status:** review **Milestone:** 5.17.08 **Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong **Last Updated:** Mon Apr 24, 2017 07:37 AM UTC **Owner:** Vu Minh Nguyen In log agent, there is a link list holding all log clients of an application process. Also, in each log client, there is an additional link list holding all log streams which belongs to each log client. Adding, modifying or deleing the link lists' elements or on sub-items of the client dabases are distrubuted in a lot of places, this could easily cause troubles regarding race condition, deadlock, or risks when adding code that do changes the databases. So, this ticket intends to remove that concern by doing: 1) Centralizing read/write accesses to the database to one place with its private mutex 2) Use C++ containters to contain and handle databases And will push the ticket in 02 increments: 1) Convert agent code to C++ without touching any existing logic (looks like what AMF has done it in [#1673]) 2) Do #1 and #2 above --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD
- **status**: review --> fixed --- ** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD** **Status:** fixed **Milestone:** 5.17.06 **Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen **Last Updated:** Tue Apr 25, 2017 06:44 AM UTC **Owner:** Hung Nguyen **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) (149.4 kB; application/x-compressed) When Standby IMMD is up at the same time with a IMMND exiting, the info of that IMMND might not be removed from **immnd_tree** of the Standby IMMD. Details of the problem is explained in the sequence diagram below [sequence diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA) SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting ~~~ 18:35:03 SC-1 osafimmnd[441]: exiting for shutdown 18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:568511936070075) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:567412424442298) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:566312912814523) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:565213401186744) 18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, dest:564113889558969) ~~~ Down event for IMMND@SC-1 was received on SC-5 but not on SC-2. **The symptoms:** 1. If the down IMMND is the corrdinator, that results in when that Standby IMMD becomes Active, it fails to elect new coordinator as there's already a coordinator in the **immnd_tree**. ~~~ 18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed => electing new coord ~~~ No more logs about newly elected coordinator were printed out. 2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch. ~~~ 18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING 18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord 18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD
- **Blocker**: --> False - **Milestone**: 5.0.2 --> 5.17.06 - **Comment**: 5.17.08 (develop) [code:85c90b] ~~~ commit 85c90b4abead8bd66e1f20be3f84255645880597 Author: Hung NguyenDate: Tue Apr 25 13:24:29 2017 +0700 imm: Ignore the sync'ed IMMND nodes that are not up [#2418] ~~~ 5.17.06 (release) [code:c1a37f] ~~~ commit c1a37fb5032c0e63165bc36e79d5a79be3fd19dd Author: Hung Nguyen Date: Tue Apr 25 13:24:29 2017 +0700 imm: Ignore the sync'ed IMMND nodes that are not up [#2418] ~~~ default (mercurial) [staging:dc6067] ~~~ changeset: 8777:dc60670bfd3b user:Hung Nguyen date:Tue Apr 25 13:40:04 2017 +0700 summary: imm: Ignore the sync'ed IMMND nodes that are not up [#2418] ~~~ --- ** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD** **Status:** review **Milestone:** 5.17.06 **Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen **Last Updated:** Thu Apr 13, 2017 10:08 AM UTC **Owner:** Hung Nguyen **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) (149.4 kB; application/x-compressed) When Standby IMMD is up at the same time with a IMMND exiting, the info of that IMMND might not be removed from **immnd_tree** of the Standby IMMD. Details of the problem is explained in the sequence diagram below [sequence diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA) SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting ~~~ 18:35:03 SC-1 osafimmnd[441]: exiting for shutdown 18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:568511936070075) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:567412424442298) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:566312912814523) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:565213401186744) 18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, dest:564113889558969) ~~~ Down event for IMMND@SC-1 was received on SC-5 but not on SC-2. **The symptoms:** 1. If the down IMMND is the corrdinator, that results in when that Standby IMMD becomes Active, it fails to elect new coordinator as there's already a coordinator in the **immnd_tree**. ~~~ 18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed => electing new coord ~~~ No more logs about newly elected coordinator were printed out. 2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch. ~~~ 18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING 18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord 18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets