The possibility to have more than two system controllers (one active + 
several standby and/or spare controller nodes) is also something that 
has been investigated. For scalability reasons, we probably can't turn 
all nodes into standby controllers in a large cluster - but it may be 
feasible to have a system with one or several standby controllers and 
the rest of the nodes are spares that are ready to take an active or 
standby assignment when needed.

However, the "headless" feature will still be needed in some systems 
where you need dedicated controller node(s).

/ Anders Widell

On 10/13/2015 12:07 PM, Tony Hart wrote:
> Understood.  The assumption is that this is temporary but we allow the 
> payloads to continue to run (with reduced osaf functionality) until a 
> replacement controller is found.  At that point they can reboot to get the 
> system back into sync.
>
> Or allow more than 2 controllers in the system so we can have one or more 
> usually-payload cards be controllers to reduce the probability of 
> no-controllers to an acceptable level.
>
>
>> On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt 
>> <[email protected]> wrote:
>>
>> The headless state is also vulnerable to split-brain scenarios.
>> That is network partitions and joins can occur and will not be detected as 
>> such and thus not handled properly (isolated) when they occur.
>> Basically you can  not be sure you have a continuously coherent cluster 
>> while in the headless state.
>>
>> On paper you may get a very resilient system in the sense that It "stays up" 
>>  and replies on ping etc.
>> But typically a customer wants not just availability but reliable behavior 
>> also.
>>
>> /AndersBj
>>
>>
>> -----Original Message-----
>> From: Anders Björnerstedt [mailto:[email protected]]
>> Sent: den 12 oktober 2015 16:42
>> To: Anders Widell; Tony Hart; [email protected]
>> Subject: Re: [users] Avoid rebooting payload modules after losing system 
>> controller
>>
>> Note that this headless variant  is a very questionable feature. This for 
>> the reasons explained earlier, i.e. you *will*  get a reduction in service 
>> availability.
>> It was never accepted into OpenSAF for that reason.
>>
>> On top of that the unreliability will typically not he explicit/handled. 
>> That is the operator will probably not even know what is working and what is 
>> not during the SC absence since the alarm/notification  function is gone. No 
>> OpenSAF director services are executing.
>>
>> It is truly a headless system, i.e. a zombie system and thus not working at 
>> full monitoring and availability functionality.
>> It begs the question of what OpenSAF and SAF is there for in the first place.
>>
>> The SCs don’t have to run any special software and don’t have to have any 
>> special hardware.
>> They do need file system access, at least for a cluster restart, but not 
>> necessarily to handle single SC failure.
>> The headless variant when headless is also in that 
>> not-able-to-cluster-restart also, but with even less functionality.
>>
>> An SC can of course run other (non OpenSAF specific) software.  And the two 
>> SCs don’t necessarily have to be symmetric in terms of software.
>>
>> Providing file system access via NFS is typically a non issue. They have 
>> three nodes. Ergo  they should be able to assign two of them the role of SC 
>> in the OpensAF domain.
>>
>> /AndersBj
>>
>> -----Original Message-----
>> From: Anders Widell [mailto:[email protected]]
>> Sent: den 12 oktober 2015 16:08
>> To: Tony Hart; [email protected]
>> Subject: Re: [users] Avoid rebooting payload modules after losing system 
>> controller
>>
>> We have actually implemented something very similar to what you are talking 
>> about. With this feature, the payloads can survive without a cluster restart 
>> even if both system controllers restart (or the single system controller, in 
>> your case). If you want to try it out, you can clone this Mercurial 
>> repository:
>>
>> https://sourceforge.net/u/anders-w/opensaf-headless/
>>
>> To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in 
>> immd.conf to the amount of seconds you wish the payloads to wait for the 
>> system controllers to come back. Note: we have only implemented this feature 
>> for the "core" OpenSAF services (plus CKPT), so you need to disable the 
>> optional serivces.
>>
>> / Anders Widell
>>
>> On 10/11/2015 02:30 PM, Tony Hart wrote:
>>> We have been using opensaf in our product for a couple of years now.  One 
>>> of the issues we have is the fact that payload cards reboot when the system 
>>> controllers are lost.  Although our payload card hardware will continue to 
>>> perform its functions whilst the software is down (which is desirable) the 
>>> functions that the software performs are obviously not performed (which is 
>>> not desirable).
>>>
>>> Why would we loose both controllers, surely that is a rare circumstance?  
>>> Not if you only have one controller to begin with.  Removing the second 
>>> controller is a significant cost saving for us so we want to support a 
>>> product that only has one controller.  The most significant impediment to 
>>> that is the loss of payload software functions when the system controller 
>>> fails.
>>>
>>> I’m looking for suggestions from this email list as to what could be done 
>>> for this issue.
>>>
>>> One suggestion, that would work for us, is if we could convince the payload 
>>> card to only reboot when the controller reappears after a loss rather than 
>>> when the loss initially occurs.  Is that possible?
>>>
>>> Another possibility is if we could support more than 2 controllers, for 
>>> example if we could support 4 (one active and 3 standbys) that would also 
>>> provide a solution for us (our current payloads would instead become 
>>> controllers).  I know that this is not currently possible with opensaf.
>>>
>>> thanks for any suggestions,
>>> —
>>> tony
>>> ----------------------------------------------------------------------
>>> -------- _______________________________________________
>>> Opensaf-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users



------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to