[tickets] [opensaf:tickets] #1950 Amf: cleanup command is called during termination after health check failure
- **status**: unassigned --> review - **assigned_to**: Gary Lee --- ** [tickets:#1950] Amf: cleanup command is called during termination after health check failure** **Status:** review **Milestone:** 5.0.2 **Created:** Fri Aug 12, 2016 09:38 AM UTC by Nagendra Kumar **Last Updated:** Tue Sep 20, 2016 05:58 PM UTC **Owner:** Gary Lee Steps to reproduce -- When NPI component is in termination state and health check also running concurrently, then there is rare chance that health check may return failure because of inability of component to respond during termination. In such case, after healtch check(HC) reported failure, Amf is sending the cleanup command to the component. Observed behaviour -- During cleanup command, there may be contention of resources and there are fair chances that cleanup command uses fiorceful termination of component resulting in generating core dump. Expected behaviour -- The expected behavious can be either of : 1.Amf detecting that HC failure during component termination is a false alarm and ignore the error and ler termination command succeed and then let the rest follows. 2. Amf runs clean up command as explained in the description above. This is because of inability of Amf to detect wherther health check is because of termination command or because of genuine issue that component was undergoing and it has reported HC failure just after issuing of terminate command. If Amf doesn't take any action then there is likely possibility of termination command timeout as erraneous component can't be trusted. This will delay the repair and recovery for configured timeout period. This is unwanted for sure. Please note that PI component is also being cleaned up in the similar way. So, we need to converge the understanding and evaluate which one is better solution from use case point of view. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
- **status**: accepted --> review - **assigned_to**: Pham Hoang Nhat --> Vo Minh Hoang --- ** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** review **Milestone:** 5.0.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Thu Oct 13, 2016 01:20 AM UTC **Owner:** Vo Minh Hoang **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: retval = 1 Apr 15 8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with return value:2,ckptHandle:63 Apr 15 8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: API return code = 2** > Traces of both controllers and agent trace of payload is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless
Hi Praveen, Nagu, Thanks for reminding me this, I will add these points to PR Doc and README In #1725 part 2 series, there's a patch that is trying to detect inappropriate RTAs read from IMM after headless. It could happen for AdminState also since the IMM update is queued at AMFD. Decision is still open since I'm not quite sure how to do. @Nagu: By the way, the pending patches are OK wih testing at your side? Thanks, Minh --- ** [tickets:#1725] AMF: Recover transient SUSIs left over from headless** **Status:** fixed **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau **Last Updated:** Mon Oct 24, 2016 11:54 AM UTC **Owner:** nobody This ticket is more likely an enhancement that targets on how AMFD detect and recover the transients SUSI left over from headless. There are three major situations: (1) - Cluster goes headless, su/node failover on any payloads can happen, or any payloads can be hard rebooted/powered off by operator, then cluster recover (2) - issue admin op on any AMF entities, cluster goes headless. During headless, the middle HA assignments of whole admin op sequence between AMFND and components could be: (2.1) The assignment completes, component returns OK with csi callback, then cluster recover (2.2) The assignment is under going, then cluster recover. The assignment afterward could complete, or csi callback returns FAILED_OPERATION or error can also happen At the time cluster recover, amfd has collected all assignments from all amfnd(s). These assignments can be in assigned or assigning states whilst its HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen in a combination, which means while issuing admin op (2), cluster go headless and any kinds of failover (1) can happen during headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2133 AMF: Rollback admin shutdown SI operation if node failover
Hi Praveen, I agree for both 1) and 2) that lock and shutdown SI operations should not be reverted in case of fault. The reason (I think) is when an operation lock/shutdown SI is issued, that likely means application denies providing service, which could involve in some kinds of releasing resource, closing connection, ...So revert back to UNLOCKED with active assignment will highly force application to continue providing service, that could end up many unhandled cases at applications. At page 83, a migration from quiesced to active, I think it's for failover during si-swap, where an error happens at current STANDBY SU after quiesced ACTIVE SU. I also would like to listen to other maintainers. Thanks, Minh --- ** [tickets:#2133] AMF: Rollback admin shutdown SI operation if node failover** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 20, 2016 06:49 PM UTC by Minh Hon Chau **Last Updated:** Fri Oct 21, 2016 06:11 AM UTC **Owner:** nobody In scenario of shut down SI, delay QUIESCING csi callback, then reboot the node that hosting SU having pending this csi callback. The result of this operation looks differently between SGs - For 2N: the SI Admin state is rollbacked to UNLOCK - For Nway: the SI Admin state moves to LOCKED - In NpM: Haven't tested just browsing SG_NPM::node_fail_si_oper, looks SI Admin states rollbacks to UNLOCK My question is whether the result of these scenario should be consistent? And what's the expected outcome? Also, the handling of node_fail_si_oper for admin lock is not consistent. For 2N, Admin state remains LOCKED, NpM rollbacks to UNLOCK --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM
Hi Praveen, As I see the code inside si_rt_attr_cb() updates saAmfSINumCurrActiveAssignments and saAmfSINumCurrStandbyAssignments to IMM without any check of fsm state, or maybe I am wrong? If we do update new saAmfHAState after AMFD receives assignment response from AMFND, in case of shutting down SI, after component returns by saAmfCSIQuiescingComplete(), the HAState carried in assignment response to AMFD will be QUIESCED. Then in this case, AMFD will never have QUIESCING appeared to IMM? I have also looked up the user list email "Re: [users] CSI and SI assignments are updated in runtime attributes early". I think the right attribute we should consider is saAmfSIAssignmentState instead since the enquiry seemed about whether assignment actually is happening (AssignmentState), not assignment role (HaState) My suggesstion for #1354 is: - AMFD updates saAmfHAState at the time sending assignment request to AMFND, do not update saAmfSIAssignmentState (initially for fresh assignment would be UNASSIGNED) - When AMFND responds OK to assignment request, now a real assignment has completed actually and AMFD update counter and saAmfSIAssignmentState to (PARTIALLY_ASSIGNED or FULLY_ASSIGNED up on preferred configuration). Error like node failover, e.g... also can force an update saAmfSIAssignmentState to reflect the current status of assignment state. - Notification update is needed after assignment response. By doing this, user can know whether or a SI assignment has been completed, and this stays close to definition of 3.2.3.2 Assignment State, page 88 What do you think? Thanks, Minh --- ** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau **Last Updated:** Mon Oct 24, 2016 07:15 AM UTC **Owner:** nobody In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for example) to AMFND that changes the HA State of SUSI assignment, AMFD updates its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. However, AMFD does not updates saAmfSISUHAState untill receiving su_si assignment response. Question: (1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM as long as local @state gets updated in implementer; to make IMM, active AMFD, standby AMFD all are synced (2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si assignment from AMFND, as it has been implemented currently for some reason (not expose the change of saAmfSISUHAState to user too early?) grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does otherwise. Since the headless recovery relies on IMM to restore the state. If saAmfSISUHAState is not updated punctually and the node is reboot during headless stage, so after headless saAmfSISUHAState read from IMM does not fit with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs). My question is if doing (1) will cause any problem for normal cluster? Pending patches #1725 part 2 currently implement (1). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover
Patch for 5.0. Attachments: - [2009_5.0.patch](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98b72c10/dda5/attachment/2009_5.0.patch) (19.3 kB; application/octet-stream) --- ** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware failover** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R **Last Updated:** Mon Oct 24, 2016 06:23 AM UTC **Owner:** Praveen **Attachments:** - [add_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/add_app.xml) (4.1 kB; text/xml) - [create_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/create_app.xml) (8.0 kB; text/xml) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 ( si-si deps enabled) Summary : -- Application SIs are moving to UNASSIGNED state after middleware failover. Steps followed & Observed behaviour -- -> Initially brought up AMF application (2n model) on two payloads. -> All the SIs are fully assigned state and SUs are in INSERVICE state. -> Performed middleware failover. -> After standby became active controller, SIs moved to unassigned state. But 'amf-state siass' is showing proper output. -> Application received CSI remove callbacks after locking the SUs Expected behaviour -- -> As no fault happened on the application, SIs should not move to UNASSIGNED state for middleware failover. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless
Yes, these points need to be mentioned in the Amf PR doc. Minh, you can add some more scenario if you want and please provide some details when writing these points. Thanks -Nagu --- ** [tickets:#1725] AMF: Recover transient SUSIs left over from headless** **Status:** fixed **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau **Last Updated:** Mon Oct 24, 2016 11:50 AM UTC **Owner:** nobody This ticket is more likely an enhancement that targets on how AMFD detect and recover the transients SUSI left over from headless. There are three major situations: (1) - Cluster goes headless, su/node failover on any payloads can happen, or any payloads can be hard rebooted/powered off by operator, then cluster recover (2) - issue admin op on any AMF entities, cluster goes headless. During headless, the middle HA assignments of whole admin op sequence between AMFND and components could be: (2.1) The assignment completes, component returns OK with csi callback, then cluster recover (2.2) The assignment is under going, then cluster recover. The assignment afterward could complete, or csi callback returns FAILED_OPERATION or error can also happen At the time cluster recover, amfd has collected all assignments from all amfnd(s). These assignments can be in assigned or assigning states whilst its HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen in a combination, which means while issuing admin op (2), cluster go headless and any kinds of failover (1) can happen during headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless
Hi Minh, Since this ticlet is marked fixed, we need to document some discussed conclusion (limitation) where application will not be recovered. These were based on following cases discussed in this ticket. 1)When ssytem bcomes headless When AMFD sends some assignment message because of admin operation or recovery from fault but message does not reach to AMFND. 2)Similarly when AMFND seds some assignment response but it does not reach to AMFD as system becomes headless. These were the cases where AMFD may require to self trigger the FSM which is not possbile today.. Also there were cases where AMFD could not update IMM for attributes like SG FSM state and SUSI FSM state etc and system become headless. IN this case also recovery is not possible after headless. --- ** [tickets:#1725] AMF: Recover transient SUSIs left over from headless** **Status:** fixed **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau **Last Updated:** Mon Sep 19, 2016 04:08 AM UTC **Owner:** nobody This ticket is more likely an enhancement that targets on how AMFD detect and recover the transients SUSI left over from headless. There are three major situations: (1) - Cluster goes headless, su/node failover on any payloads can happen, or any payloads can be hard rebooted/powered off by operator, then cluster recover (2) - issue admin op on any AMF entities, cluster goes headless. During headless, the middle HA assignments of whole admin op sequence between AMFND and components could be: (2.1) The assignment completes, component returns OK with csi callback, then cluster recover (2.2) The assignment is under going, then cluster recover. The assignment afterward could complete, or csi callback returns FAILED_OPERATION or error can also happen At the time cluster recover, amfd has collected all assignments from all amfnd(s). These assignments can be in assigned or assigning states whilst its HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen in a combination, which means while issuing admin op (2), cluster go headless and any kinds of failover (1) can happen during headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2136 imm: Syslog filled with "Delete of PERSISTENT runtime object..."
- **status**: accepted --> review --- ** [tickets:#2136] imm: Syslog filled with "Delete of PERSISTENT runtime object..."** **Status:** review **Milestone:** 5.0.2 **Created:** Mon Oct 24, 2016 10:58 AM UTC by Anders Widell **Last Updated:** Mon Oct 24, 2016 11:02 AM UTC **Owner:** Anders Widell After a commit of an SMF campaign, the syslog was filled with thousands of log messages with the text "Delete of PERSISTENT runtime object ...". According to SMF maintainers, the log messages do not indicate a problem in SMF, though SMF could be optimised to reduce the number of objects it creates (a separate ticket will be created for this). Since the syslog is a global log for all applications running on a node, OpenSAF ought to be careful to not produce an excessive amount of syslog messages. Therefore, IMM should reduce the priority of these log messages to INFO, or even convert them to tracelog messages. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2136 imm: Syslog filled with "Delete of PERSISTENT runtime object..."
Created ticket [#2137] on SMF, to optimise the usage of persistent runtime objects. --- ** [tickets:#2136] imm: Syslog filled with "Delete of PERSISTENT runtime object..."** **Status:** accepted **Milestone:** 5.0.2 **Created:** Mon Oct 24, 2016 10:58 AM UTC by Anders Widell **Last Updated:** Mon Oct 24, 2016 10:58 AM UTC **Owner:** Anders Widell After a commit of an SMF campaign, the syslog was filled with thousands of log messages with the text "Delete of PERSISTENT runtime object ...". According to SMF maintainers, the log messages do not indicate a problem in SMF, though SMF could be optimised to reduce the number of objects it creates (a separate ticket will be created for this). Since the syslog is a global log for all applications running on a node, OpenSAF ought to be careful to not produce an excessive amount of syslog messages. Therefore, IMM should reduce the priority of these log messages to INFO, or even convert them to tracelog messages. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2137 smf: Reduce the number of persistent runtime objects
--- ** [tickets:#2137] smf: Reduce the number of persistent runtime objects** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Mon Oct 24, 2016 11:00 AM UTC by Anders Widell **Last Updated:** Mon Oct 24, 2016 11:00 AM UTC **Owner:** nobody As a follow-up to ticket [#2136], SMF should investigate the possibility to reduce the number of persistent runtime objects it creates - some of them might not be needed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM
Hi Minh, Ticket #1870 was realated counters in SI and SU. Before #1870, counters were updated for all HA states in all red models without any definite rule. Because of this good number of issues related to AMFD crashes were reported. Ticket #1870 was for updatig counters uniformly in all the red models based on HA state changes. Function avd_susi_update_assignment_counters() contains those rules. Counters saAmfSINumCurrActiveAssignments and saAmfSINumCurrStandbyAssignments and other similar are updated when IMM gives callback to AMF i.e for SI in si_rt_attr_cb(). So we counter is incremented inside avd_susi_mod_send() it will not be visible to user until user makes an explicitt request to AMFD via IMM. If a user query comes during assignments transition then in si_rt_attr_cb() dynamically count only those SUSIs which have fsm state ASSIGNED. Thus there will be consistency. Thanks, Praveen --- ** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau **Last Updated:** Mon Oct 24, 2016 01:20 AM UTC **Owner:** nobody In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for example) to AMFND that changes the HA State of SUSI assignment, AMFD updates its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. However, AMFD does not updates saAmfSISUHAState untill receiving su_si assignment response. Question: (1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM as long as local @state gets updated in implementer; to make IMM, active AMFD, standby AMFD all are synced (2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si assignment from AMFND, as it has been implemented currently for some reason (not expose the change of saAmfSISUHAState to user too early?) grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does otherwise. Since the headless recovery relies on IMM to restore the state. If saAmfSISUHAState is not updated punctually and the node is reboot during headless stage, so after headless saAmfSISUHAState read from IMM does not fit with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs). My question is if doing (1) will cause any problem for normal cluster? Pending patches #1725 part 2 currently implement (1). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #314 AMF looses alarms and notifications during switch-over
V2 published. Includes suggestions given by Gary. --- ** [tickets:#314] AMF looses alarms and notifications during switch-over** **Status:** review **Milestone:** 5.0.2 **Created:** Fri May 24, 2013 08:34 AM UTC by Nagendra Kumar **Last Updated:** Thu Oct 13, 2016 10:10 AM UTC **Owner:** Praveen **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/314/attachment/messages) (41.9 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/314/attachment/osafamfd) (5.7 MB; application/octet-stream) Migrated from http://devel.opensaf.org/ticket/3051 Background: http://devel.opensaf.org/ticket/3028 If another node (payload) leaves the cluster in the middle of switch-over, amfd logs this: Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendStateChangeNotificationAvd: saNtfNotificationSend Failed (6) Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendAlarmNotificationAvd: saNtfNotificationSend Failed (6) These logs means that amfd failed to send an alarm and a notification due to TRYAGAIN returned from NTF (in NOACTIVE state) AMF needs to store the alarms/notifications produced in the NOACTIVE state and send them at the end of the switch-over. Or with using a separate thread that can block forever (?) on TRYAGAIN. The problem exist in all opensaf releases --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover
- **status**: accepted --> review - Attachments has changed: Diff: --- old +++ new @@ -0,0 +1,2 @@ +add_app.xml (4.1 kB; text/xml) +create_app.xml (8.0 kB; text/xml) --- ** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware failover** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R **Last Updated:** Wed Oct 19, 2016 08:26 AM UTC **Owner:** Praveen **Attachments:** - [add_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/add_app.xml) (4.1 kB; text/xml) - [create_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/create_app.xml) (8.0 kB; text/xml) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 ( si-si deps enabled) Summary : -- Application SIs are moving to UNASSIGNED state after middleware failover. Steps followed & Observed behaviour -- -> Initially brought up AMF application (2n model) on two payloads. -> All the SIs are fully assigned state and SUs are in INSERVICE state. -> Performed middleware failover. -> After standby became active controller, SIs moved to unassigned state. But 'amf-state siass' is showing proper output. -> Application received CSI remove callbacks after locking the SUs Expected behaviour -- -> As no fault happened on the application, SIs should not move to UNASSIGNED state for middleware failover. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1669 cpsv: saCkptCheckpointNumOpeners is not updated after a node restart.
- **status**: review --> fixed - **Milestone**: 5.0.2 --> 5.1.1 - **Comment**: changeset: 8250:df552d227b21 user:Hoang Vodate:Mon Oct 24 11:29:50 2016 +0530 summary: cpsv: To update checkpoint user number for each node [#1669] changeset: 8251:e3ea4954fe07 branch: opensaf-5.1.x parent: 8248:7f6d886ebc96 user:Hoang Vo date:Mon Oct 24 11:30:47 2016 +0530 summary: cpsv: To update checkpoint user number for each node [#1669] changeset: 8252:d11165efb9b8 branch: opensaf-5.0.x tag: tip parent: 8247:663c0f16ab83 user:Hoang Vo date:Mon Oct 24 11:31:14 2016 +0530 summary: cpsv: To update checkpoint user number for each node [#1669] --- ** [tickets:#1669] cpsv: saCkptCheckpointNumOpeners is not updated after a node restart.** **Status:** fixed **Milestone:** 5.1.1 **Created:** Fri Jan 22, 2016 04:05 AM UTC by Pham Hoang Nhat **Last Updated:** Tue Sep 20, 2016 06:04 PM UTC **Owner:** Pham Hoang Nhat Problem description: saCkptCheckpointNumOpeners is not updated when a node restart. Steps to reproduce the problems are: 1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS) 2. Open this checkpoint on PL4 3. Restart PL3 After step 3. the saCkptCheckpointNumOpeners is not changed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets