There are three issues in the ticket raised.
1) As per the ticket #2094 comments, "/etc/init.d/opensafd stop" is not a
proper way to bring down opensaf. It is suggested that to bring down a faulty
node, CLM lock on the node can be performed and later reboot command can be
invoked manually.
2) I cannot think of any real use case scenario for "concurrent 'opensafd stop'
on controller and opensafd start on another controller".
In a fault scenario, reboot -f is called where none of the runlevel
services shall be called during node recovery process. So, the scenario of
simultaneous 'opensafd stop on SC-1 and opensafd start on SC-2' is not possible
in production environment.
3) Deploying such a large number of components on controller is not suggested,
as the failure or fault of user components can impact middleware ( opensaf)
functionality on the entire cluster.
---
** [tickets:#2151] osaf: system in not in correct state during Act controller
comming up**
**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 06:59 AM UTC
**Owner:** nobody
Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50
components on SC-2 and unlock them. Keep 1 sec delay in each component stop
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.
Observed behaviour:
Since components are taking time in stopping all components during 'opensad
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't
differentiate whether it is headless state or failover. This is true also as
the system is in half middle of headless and failover.
Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should
figure out on SC-1 that the peer system is going down. And should allow SC-1
only if all services are down i.e. it gets node down (may be cb->immd_down &&
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets