The headless state is also vulnerable to split-brain scenarios. That is network partitions and joins can occur and will not be detected as such and thus not handled properly (isolated) when they occur. Basically you can not be sure you have a continuously coherent cluster while in the headless state.
On paper you may get a very resilient system in the sense that It "stays up" and replies on ping etc. But typically a customer wants not just availability but reliable behavior also. /AndersBj -----Original Message----- From: Anders Björnerstedt [mailto:[email protected]] Sent: den 12 oktober 2015 16:42 To: Anders Widell; Tony Hart; [email protected] Subject: Re: [users] Avoid rebooting payload modules after losing system controller Note that this headless variant is a very questionable feature. This for the reasons explained earlier, i.e. you *will* get a reduction in service availability. It was never accepted into OpenSAF for that reason. On top of that the unreliability will typically not he explicit/handled. That is the operator will probably not even know what is working and what is not during the SC absence since the alarm/notification function is gone. No OpenSAF director services are executing. It is truly a headless system, i.e. a zombie system and thus not working at full monitoring and availability functionality. It begs the question of what OpenSAF and SAF is there for in the first place. The SCs don’t have to run any special software and don’t have to have any special hardware. They do need file system access, at least for a cluster restart, but not necessarily to handle single SC failure. The headless variant when headless is also in that not-able-to-cluster-restart also, but with even less functionality. An SC can of course run other (non OpenSAF specific) software. And the two SCs don’t necessarily have to be symmetric in terms of software. Providing file system access via NFS is typically a non issue. They have three nodes. Ergo they should be able to assign two of them the role of SC in the OpensAF domain. /AndersBj -----Original Message----- From: Anders Widell [mailto:[email protected]] Sent: den 12 oktober 2015 16:08 To: Tony Hart; [email protected] Subject: Re: [users] Avoid rebooting payload modules after losing system controller We have actually implemented something very similar to what you are talking about. With this feature, the payloads can survive without a cluster restart even if both system controllers restart (or the single system controller, in your case). If you want to try it out, you can clone this Mercurial repository: https://sourceforge.net/u/anders-w/opensaf-headless/ To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in immd.conf to the amount of seconds you wish the payloads to wait for the system controllers to come back. Note: we have only implemented this feature for the "core" OpenSAF services (plus CKPT), so you need to disable the optional serivces. / Anders Widell On 10/11/2015 02:30 PM, Tony Hart wrote: > We have been using opensaf in our product for a couple of years now. One of > the issues we have is the fact that payload cards reboot when the system > controllers are lost. Although our payload card hardware will continue to > perform its functions whilst the software is down (which is desirable) the > functions that the software performs are obviously not performed (which is > not desirable). > > Why would we loose both controllers, surely that is a rare circumstance? Not > if you only have one controller to begin with. Removing the second > controller is a significant cost saving for us so we want to support a > product that only has one controller. The most significant impediment to > that is the loss of payload software functions when the system controller > fails. > > I’m looking for suggestions from this email list as to what could be done for > this issue. > > One suggestion, that would work for us, is if we could convince the payload > card to only reboot when the controller reappears after a loss rather than > when the loss initially occurs. Is that possible? > > Another possibility is if we could support more than 2 controllers, for > example if we could support 4 (one active and 3 standbys) that would also > provide a solution for us (our current payloads would instead become > controllers). I know that this is not currently possible with opensaf. > > thanks for any suggestions, > — > tony > ---------------------------------------------------------------------- > -------- _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
