The headless state is also vulnerable to split-brain scenarios.
That is network partitions and joins can occur and will not be detected as such 
and thus not handled properly (isolated) when they occur.
Basically you can  not be sure you have a continuously coherent cluster while 
in the headless state.

On paper you may get a very resilient system in the sense that It "stays up"  
and replies on ping etc.
But typically a customer wants not just availability but reliable behavior also.

/AndersBj


-----Original Message-----
From: Anders Björnerstedt [mailto:[email protected]] 
Sent: den 12 oktober 2015 16:42
To: Anders Widell; Tony Hart; [email protected]
Subject: Re: [users] Avoid rebooting payload modules after losing system 
controller

Note that this headless variant  is a very questionable feature. This for the 
reasons explained earlier, i.e. you *will*  get a reduction in service 
availability.
It was never accepted into OpenSAF for that reason. 

On top of that the unreliability will typically not he explicit/handled. That 
is the operator will probably not even know what is working and what is not 
during the SC absence since the alarm/notification  function is gone. No 
OpenSAF director services are executing.

It is truly a headless system, i.e. a zombie system and thus not working at 
full monitoring and availability functionality.
It begs the question of what OpenSAF and SAF is there for in the first place.

The SCs don’t have to run any special software and don’t have to have any 
special hardware.
They do need file system access, at least for a cluster restart, but not 
necessarily to handle single SC failure.
The headless variant when headless is also in that not-able-to-cluster-restart 
also, but with even less functionality.

An SC can of course run other (non OpenSAF specific) software.  And the two SCs 
don’t necessarily have to be symmetric in terms of software. 

Providing file system access via NFS is typically a non issue. They have three 
nodes. Ergo  they should be able to assign two of them the role of SC in the 
OpensAF domain.

/AndersBj

-----Original Message-----
From: Anders Widell [mailto:[email protected]]
Sent: den 12 oktober 2015 16:08
To: Tony Hart; [email protected]
Subject: Re: [users] Avoid rebooting payload modules after losing system 
controller

We have actually implemented something very similar to what you are talking 
about. With this feature, the payloads can survive without a cluster restart 
even if both system controllers restart (or the single system controller, in 
your case). If you want to try it out, you can clone this Mercurial repository:

https://sourceforge.net/u/anders-w/opensaf-headless/

To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in immd.conf 
to the amount of seconds you wish the payloads to wait for the system 
controllers to come back. Note: we have only implemented this feature for the 
"core" OpenSAF services (plus CKPT), so you need to disable the optional 
serivces.

/ Anders Widell

On 10/11/2015 02:30 PM, Tony Hart wrote:
> We have been using opensaf in our product for a couple of years now.  One of 
> the issues we have is the fact that payload cards reboot when the system 
> controllers are lost.  Although our payload card hardware will continue to 
> perform its functions whilst the software is down (which is desirable) the 
> functions that the software performs are obviously not performed (which is 
> not desirable).
>
> Why would we loose both controllers, surely that is a rare circumstance?  Not 
> if you only have one controller to begin with.  Removing the second 
> controller is a significant cost saving for us so we want to support a 
> product that only has one controller.  The most significant impediment to 
> that is the loss of payload software functions when the system controller 
> fails.
>
> I’m looking for suggestions from this email list as to what could be done for 
> this issue.
>
> One suggestion, that would work for us, is if we could convince the payload 
> card to only reboot when the controller reappears after a loss rather than 
> when the loss initially occurs.  Is that possible?
>
> Another possibility is if we could support more than 2 controllers, for 
> example if we could support 4 (one active and 3 standbys) that would also 
> provide a solution for us (our current payloads would instead become 
> controllers).  I know that this is not currently possible with opensaf.
>
> thanks for any suggestions,
> —
> tony
> ----------------------------------------------------------------------
> -------- _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users



------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users
------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users
------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to