[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

Srikanth R Tue, 01 Nov 2016 03:11:21 -0700

There are three issues in the ticket raised.

1) As per the ticket #2094 comments, "/etc/init.d/opensafd stop" is not a 
proper way to bring down opensaf. It is suggested that to bring down a faulty 
node,  CLM lock on the node can be performed and later reboot command can be 
invoked manually.


2) I cannot think of any real use case scenario for "concurrent 'opensafd stop' 
on controller and opensafd start on another controller".

    In a fault scenario, reboot -f is called where none of the runlevel 
services shall be called during node recovery process. So, the scenario of 
simultaneous 'opensafd stop on SC-1 and opensafd start on SC-2' is not possible 
in production environment.
   
3) Deploying such a large number of components on controller is not suggested, 
as the failure or fault of user components can impact middleware ( opensaf) 
functionality on the entire cluster.


---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 06:59 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

Reply via email to