[tickets] [opensaf:tickets] #1950 Amf: cleanup command is called during termination after health check failure

2016-10-24 Thread Gary Lee
- **status**: unassigned --> review
- **assigned_to**: Gary Lee



---

** [tickets:#1950] Amf: cleanup command is called during termination after 
health check failure**

**Status:** review
**Milestone:** 5.0.2
**Created:** Fri Aug 12, 2016 09:38 AM UTC by Nagendra Kumar
**Last Updated:** Tue Sep 20, 2016 05:58 PM UTC
**Owner:** Gary Lee


Steps to reproduce
--
When NPI component is in termination state and health check also running 
concurrently, then there is rare chance that health check may return failure 
because of inability of component to respond during termination.
In such case, after healtch check(HC) reported failure, Amf is sending the 
cleanup command to the component.

Observed behaviour
   --
During cleanup command, there may be contention of resources and there are fair 
chances that cleanup command uses fiorceful termination of component resulting 
in generating core dump.

Expected behaviour
--
The expected behavious can be either of :
1.Amf detecting that HC failure during component termination is a false 
alarm and ignore the error and ler termination command succeed and then let the 
rest follows.
2. Amf runs clean up command as explained in the description above. This is 
because of inability of Amf to detect wherther health check is because of 
termination command or because of genuine issue that component was undergoing 
and it has reported HC failure just after issuing of terminate command. If Amf 
doesn't take any action then there is likely possibility of termination command 
timeout as erraneous component can't be trusted. This will delay the repair and 
recovery for configured timeout period. This is unwanted for sure. Please note 
that PI component is also being cleaned up in the similar way.

So, we need to converge the understanding and evaluate which one is better 
solution from use case point of view.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover

2016-10-24 Thread Vo Minh Hoang
- **status**: accepted --> review
- **assigned_to**: Pham Hoang Nhat --> Vo Minh Hoang



---

** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing 
SA_AIS_ERR_LIBRARY after couple of failover**

**Status:** review
**Milestone:** 5.0.2
**Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj
**Last Updated:** Thu Oct 13, 2016 01:20 AM UTC
**Owner:** Vo Minh Hoang
**Attachments:**

- 
[ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2)
 (3.2 MB; application/x-bzip)


setup:
Changeset- 7436
Version - opensaf 5.0 FC
4 nodes configured with single PBE and a load of 30K objects

* Issue observed :
saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after 
couple of failover

* Steps to reproduce:
> Ran couple of failover and observed saCkptCheckpointOpen failed.
> below is the snippet of agent trace:

Apr 15  8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: 
retval = 1
Apr 15  8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with 
return value:2,ckptHandle:63
Apr 15  8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: 
API return code = 2**

> Traces of both controllers and agent trace of payload is attached.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

2016-10-24 Thread Minh Hon Chau
Hi Praveen, Nagu,

Thanks for reminding me this, I will add these points to PR Doc and README
In #1725 part 2 series, there's a patch that is trying to detect inappropriate 
RTAs read from IMM after headless. It could happen for AdminState also since 
the IMM update is queued at AMFD. Decision is still open since I'm not quite 
sure how to do. 

@Nagu: By the way, the pending patches are OK wih testing at your side?

Thanks,
Minh


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Mon Oct 24, 2016 11:54 AM UTC
**Owner:** nobody


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
(2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
(2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen

At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2133 AMF: Rollback admin shutdown SI operation if node failover

2016-10-24 Thread Minh Hon Chau
Hi Praveen,

I agree for both 1) and 2) that lock and shutdown SI operations should not be 
reverted in case of fault. The reason (I think) is when an operation 
lock/shutdown SI is issued, that likely means application denies providing 
service, which could involve in some kinds of releasing resource, closing 
connection, ...So revert back to UNLOCKED with active assignment will highly 
force application to continue providing service, that could end up many 
unhandled cases at applications.

At page 83, a migration from quiesced to active, I think it's for failover 
during si-swap, where an error happens at current STANDBY SU after quiesced 
ACTIVE SU.

I also would like to listen to other maintainers.

Thanks,
Minh


---

** [tickets:#2133] AMF: Rollback admin shutdown SI operation if node failover**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Thu Oct 20, 2016 06:49 PM UTC by Minh Hon Chau
**Last Updated:** Fri Oct 21, 2016 06:11 AM UTC
**Owner:** nobody


In scenario of shut down SI, delay QUIESCING csi callback, then reboot the node 
that hosting SU having pending this csi callback. The result of this operation 
looks differently between SGs
- For 2N: the SI Admin state is rollbacked to UNLOCK 
- For Nway: the SI Admin state moves to LOCKED
- In NpM: Haven't tested just browsing SG_NPM::node_fail_si_oper, looks SI 
Admin states rollbacks to UNLOCK

My question is whether the result of these scenario should be consistent? And 
what's the expected outcome?
Also, the handling of node_fail_si_oper for admin lock is not consistent. For 
2N, Admin state remains LOCKED, NpM rollbacks to UNLOCK


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM

2016-10-24 Thread Minh Hon Chau
Hi Praveen,

As I see the code inside si_rt_attr_cb() updates 
saAmfSINumCurrActiveAssignments and saAmfSINumCurrStandbyAssignments to IMM 
without any check of fsm state, or maybe I am wrong?

If we do update new saAmfHAState after AMFD receives assignment response from 
AMFND, in case of shutting down SI, after component returns by 
saAmfCSIQuiescingComplete(), the HAState carried in assignment response to AMFD 
will be QUIESCED. Then in this case, AMFD will never have QUIESCING appeared to 
IMM?

I have also looked up the user list email "Re: [users] CSI and SI assignments 
are updated in runtime attributes early". I think the right attribute we should 
consider is saAmfSIAssignmentState instead since the enquiry seemed about 
whether assignment actually is happening (AssignmentState), not assignment role 
(HaState)

My suggesstion for #1354 is:
- AMFD updates saAmfHAState at the time sending assignment request to AMFND, do 
not update saAmfSIAssignmentState (initially for fresh assignment would be 
UNASSIGNED) 
- When AMFND responds OK to assignment request, now a real assignment has 
completed actually and AMFD update counter and saAmfSIAssignmentState to 
(PARTIALLY_ASSIGNED or FULLY_ASSIGNED up on preferred configuration). Error 
like node failover, e.g... also can force  an update saAmfSIAssignmentState to 
reflect the current status of assignment state.
- Notification update is needed after assignment response.

By doing this, user can know whether or a SI assignment has been completed, and 
this stays close to definition of 3.2.3.2 Assignment State, page 88

What do you think?

Thanks,
Minh


---

** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau
**Last Updated:** Mon Oct 24, 2016 07:15 AM UTC
**Owner:** nobody


In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for 
example) to AMFND that changes the HA State of SUSI assignment, AMFD updates 
its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. 
However, AMFD does not updates saAmfSISUHAState untill receiving su_si 
assignment response. Question:
(1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM 
as long as local @state gets updated in implementer; to make IMM, active AMFD, 
standby AMFD all are synced
(2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si 
assignment from AMFND, as it has been implemented currently for some reason 
(not expose the change of saAmfSISUHAState to user too early?)

grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an 
inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also 
updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does 
otherwise. 

Since the headless recovery relies on IMM to restore the state. If 
saAmfSISUHAState is not updated punctually and the node is reboot during 
headless stage, so after headless saAmfSISUHAState read from IMM does not fit 
with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs).

My question is if doing (1) will cause any problem for normal cluster? Pending 
patches #1725 part 2 currently implement (1).



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-10-24 Thread Praveen
Patch for 5.0.


Attachments:

- 
[2009_5.0.patch](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98b72c10/dda5/attachment/2009_5.0.patch)
 (19.3 kB; application/octet-stream)


---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Mon Oct 24, 2016 06:23 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[add_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/add_app.xml)
 (4.1 kB; text/xml)
- 
[create_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/create_app.xml)
 (8.0 kB; text/xml)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

2016-10-24 Thread Nagendra Kumar
Yes, these points need to be mentioned in the Amf PR doc.
Minh, you can add some more scenario if you want and please provide some 
details when writing these points.
Thanks
-Nagu


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Mon Oct 24, 2016 11:50 AM UTC
**Owner:** nobody


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
(2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
(2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen

At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

2016-10-24 Thread Praveen
Hi Minh,

Since this ticlet is marked fixed, we need to document  some discussed 
conclusion (limitation) where application will not be recovered. These were 
based on following cases discussed in this ticket.
1)When ssytem bcomes headless When AMFD sends some assignment message because 
of admin operation or recovery from fault but message does not reach to AMFND.
2)Similarly when AMFND seds some assignment response but it does not reach to 
AMFD as system becomes headless.
These were the cases where AMFD may require to self trigger the FSM which is 
not possbile today.. Also there were cases where AMFD could not update IMM for 
attributes like SG FSM state and SUSI FSM state etc and system become headless. 
IN this case also recovery is not possible after headless.




---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Mon Sep 19, 2016 04:08 AM UTC
**Owner:** nobody


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
(2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
(2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen

At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2136 imm: Syslog filled with "Delete of PERSISTENT runtime object..."

2016-10-24 Thread Anders Widell
- **status**: accepted --> review



---

** [tickets:#2136] imm: Syslog filled with "Delete of PERSISTENT runtime 
object..."**

**Status:** review
**Milestone:** 5.0.2
**Created:** Mon Oct 24, 2016 10:58 AM UTC by Anders Widell
**Last Updated:** Mon Oct 24, 2016 11:02 AM UTC
**Owner:** Anders Widell


After a commit of an SMF campaign, the syslog was filled with thousands of log 
messages with the text "Delete of PERSISTENT runtime object ...". According to 
SMF maintainers, the log messages do not indicate a problem in SMF, though SMF 
could be optimised to reduce the number of objects it creates (a separate 
ticket will be created for this).

Since the syslog is a global log for all applications running on a node, 
OpenSAF ought to be careful to not produce an excessive amount of syslog 
messages. Therefore, IMM should reduce the priority of these log messages to 
INFO, or even convert them to tracelog messages.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2136 imm: Syslog filled with "Delete of PERSISTENT runtime object..."

2016-10-24 Thread Anders Widell
Created ticket [#2137] on SMF, to optimise the usage of persistent runtime 
objects.


---

** [tickets:#2136] imm: Syslog filled with "Delete of PERSISTENT runtime 
object..."**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Mon Oct 24, 2016 10:58 AM UTC by Anders Widell
**Last Updated:** Mon Oct 24, 2016 10:58 AM UTC
**Owner:** Anders Widell


After a commit of an SMF campaign, the syslog was filled with thousands of log 
messages with the text "Delete of PERSISTENT runtime object ...". According to 
SMF maintainers, the log messages do not indicate a problem in SMF, though SMF 
could be optimised to reduce the number of objects it creates (a separate 
ticket will be created for this).

Since the syslog is a global log for all applications running on a node, 
OpenSAF ought to be careful to not produce an excessive amount of syslog 
messages. Therefore, IMM should reduce the priority of these log messages to 
INFO, or even convert them to tracelog messages.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2137 smf: Reduce the number of persistent runtime objects

2016-10-24 Thread Anders Widell



---

** [tickets:#2137] smf: Reduce the number of persistent runtime objects**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 24, 2016 11:00 AM UTC by Anders Widell
**Last Updated:** Mon Oct 24, 2016 11:00 AM UTC
**Owner:** nobody


As a follow-up to ticket [#2136], SMF should investigate the possibility to 
reduce the number of persistent runtime objects it creates - some of them might 
not be needed.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM

2016-10-24 Thread Praveen
Hi Minh,
Ticket #1870 was realated counters in SI and SU. Before #1870, counters were 
updated for all HA states in all red models without any definite rule. Because 
of this good number of  issues related to AMFD crashes were reported. Ticket 
#1870 was for updatig counters uniformly in all the red models based on HA 
state changes.  Function avd_susi_update_assignment_counters() contains those 
rules.

Counters saAmfSINumCurrActiveAssignments and saAmfSINumCurrStandbyAssignments  
and other similar are updated when IMM gives callback to AMF i.e for SI in 
si_rt_attr_cb().  So we counter is incremented inside avd_susi_mod_send() it 
will not be visible to user until user makes an explicitt request to AMFD via 
IMM. If a user query comes during assignments transition then in 
si_rt_attr_cb() dynamically count only those SUSIs which have fsm state 
ASSIGNED.  Thus there will be consistency. 

Thanks,
Praveen


---

** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau
**Last Updated:** Mon Oct 24, 2016 01:20 AM UTC
**Owner:** nobody


In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for 
example) to AMFND that changes the HA State of SUSI assignment, AMFD updates 
its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. 
However, AMFD does not updates saAmfSISUHAState untill receiving su_si 
assignment response. Question:
(1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM 
as long as local @state gets updated in implementer; to make IMM, active AMFD, 
standby AMFD all are synced
(2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si 
assignment from AMFND, as it has been implemented currently for some reason 
(not expose the change of saAmfSISUHAState to user too early?)

grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an 
inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also 
updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does 
otherwise. 

Since the headless recovery relies on IMM to restore the state. If 
saAmfSISUHAState is not updated punctually and the node is reboot during 
headless stage, so after headless saAmfSISUHAState read from IMM does not fit 
with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs).

My question is if doing (1) will cause any problem for normal cluster? Pending 
patches #1725 part 2 currently implement (1).



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #314 AMF looses alarms and notifications during switch-over

2016-10-24 Thread Praveen
V2 published. Includes suggestions given by Gary.


---

** [tickets:#314] AMF looses alarms and notifications during switch-over**

**Status:** review
**Milestone:** 5.0.2
**Created:** Fri May 24, 2013 08:34 AM UTC by Nagendra Kumar
**Last Updated:** Thu Oct 13, 2016 10:10 AM UTC
**Owner:** Praveen
**Attachments:**

- [messages](https://sourceforge.net/p/opensaf/tickets/314/attachment/messages) 
(41.9 kB; application/octet-stream)
- [osafamfd](https://sourceforge.net/p/opensaf/tickets/314/attachment/osafamfd) 
(5.7 MB; application/octet-stream)


Migrated from http://devel.opensaf.org/ticket/3051

Background: http://devel.opensaf.org/ticket/3028


If another node (payload) leaves the cluster in the middle of switch-over, amfd 
logs this:


Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendStateChangeNotificationAvd: 
saNtfNotificationSend Failed (6)
Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendAlarmNotificationAvd: 
saNtfNotificationSend Failed (6)


These logs means that amfd failed to send an alarm and a notification due to 
TRYAGAIN returned from NTF (in NOACTIVE state)


AMF needs to store the alarms/notifications produced in the NOACTIVE state and 
send them at the end of the switch-over. Or with using a separate thread that 
can block forever (?) on TRYAGAIN.


The problem exist in all opensaf releases





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-10-24 Thread Praveen
- **status**: accepted --> review
- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1,2 @@
+add_app.xml (4.1 kB; text/xml)
+create_app.xml (8.0 kB; text/xml)






---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Wed Oct 19, 2016 08:26 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[add_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/add_app.xml)
 (4.1 kB; text/xml)
- 
[create_app.xml](https://sourceforge.net/p/opensaf/tickets/2009/attachment/create_app.xml)
 (8.0 kB; text/xml)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1669 cpsv: saCkptCheckpointNumOpeners is not updated after a node restart.

2016-10-24 Thread A V Mahesh (AVM)
- **status**: review --> fixed
- **Milestone**: 5.0.2 --> 5.1.1
- **Comment**:

changeset:   8250:df552d227b21
user:Hoang Vo 
date:Mon Oct 24 11:29:50 2016 +0530
summary: cpsv: To update checkpoint user number for each node [#1669]
 
changeset:   8251:e3ea4954fe07
branch:  opensaf-5.1.x
parent:  8248:7f6d886ebc96
user:Hoang Vo 
date:Mon Oct 24 11:30:47 2016 +0530
summary: cpsv: To update checkpoint user number for each node [#1669]
 
changeset:   8252:d11165efb9b8
branch:  opensaf-5.0.x
tag: tip
parent:  8247:663c0f16ab83
user:Hoang Vo 
date:Mon Oct 24 11:31:14 2016 +0530
summary: cpsv: To update checkpoint user number for each node [#1669]



---

** [tickets:#1669] cpsv: saCkptCheckpointNumOpeners is not updated after a node 
restart.**

**Status:** fixed
**Milestone:** 5.1.1
**Created:** Fri Jan 22, 2016 04:05 AM UTC by Pham Hoang Nhat
**Last Updated:** Tue Sep 20, 2016 06:04 PM UTC
**Owner:** Pham Hoang Nhat


 Problem description:
 
 saCkptCheckpointNumOpeners is not updated when a node restart.
 
 Steps to reproduce the problems are:
 1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS)
 2. Open this checkpoint on PL4
 3. Restart PL3
 
 After step 3. the saCkptCheckpointNumOpeners is not changed.
 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets