Hi,

I assume you are thinking about flexiblity in error escalation and ability to 
configure thresholds!

Note that in general, node reboot could be described as the ultimate action 
done by AMF as an attempt to perform
automatic recovery and repair of a recurrent error that occurred in that AMF 
application hosted on that node.

This reboot (auto-recovery-cum-repair) is again based on user's configuration 
in AMF.
An AMF application can be configured to go through sufficient and gradual 
'levels' of error escalation.
i.e. from component-restart => serviceunit-restart => serviceunit-failover=> 
nodefailover =>nodefailfast
with threshold limits applicable

In your case, note that the lifecycle of an AMF component(application) is 
controlled by 
OpenSAF running on that local node.

Like i said, the reboot was trigerred because of a fault in your 
component(application) and because you had configured so in the .xml. So you 
would ideally fix that fault first.

Having said that, you have two options

(a) You can avoid a reboot is if you changed your component(application)'s 
configuration.
i.e. If you want to failover(the first time you hit an error) your workload to 
another node, but avoid an OS reboot, then you can configure
component's recommended recovery as NODE FAILOVER, i.e.

                <attr>  
                        <name>saAmfCompRecoveryOnError</name>
                        <value>5</value>
                </attr>


Also, you should disable the flag that enables node auto repair i.e.
by setting the following attribute corresponding to that node:
                <attr>
                        <name>saAmfNodeAutoRepair</name>
                        <value>0</value>
                </attr>

OR
(b) If you want gradual escalation from component restart to node failover (no 
reboot) then
your configuration would look like:
...
                <attr>  
                        <name>saAmfCompRecoveryOnError</name>
                        <value>5</value>
                </attr>

...
and the node auto repair would be disabled:

                <attr>
                        <name>saAmfNodeAutoRepair</name>
                        <value>0</value>
                </attr>
...
and the component restart should be enabled

                <attr>
                        <name>saAmfCompDisableRestart</name>
                        <value>0</value>
                </attr>

Cheers,
Mathi.

----- [email protected] wrote:

> Thanks Mathi for the suggestion.
> 
> 
> To confirm again, does the payload node reboot is something a local
> decision (node under reboot) based on its opensaf_reboot script or is
> it a decision taken and performed from controller node? I believe,
> from the active controller node that, its a controller node decision
> to reboot the payload node. Please correct me if wrong.
> 
> 
> In my case, I am planning for conditional reboot which is reboot the
> node once and when the node comes back up, then stop further node
> reboots based some condition.
> 
> 
> Can you please suggest?
> 
> 
> Best regards,
> Santosh
> 
> 
> On Mon, Feb 9, 2015 at 2:06 AM, Mathivanan Naickan Palanivelu <
> [email protected] > wrote:
> 
> 
> Hi Santosh,
> 
> Yes, the reboot can be controlled by tuning the OPENSAF_REBOOT_TIMEOUT
> configuration
> attribute in /etc/opensaf/nid.conf:
> 
> Set it to zero to disable reboot, i.e. export OPENSAF_REBOOT_TIMEOUT=0
> 
> Mathi.
> 
> 
> 
> 
> ----- [email protected] wrote:
> 
> > Hi,
> >
> > Can we control the node reboot behavior by changing the
> > opensaf_reboot
> > script at the node level perform some additional action instead of a
> > reboot?
> >
> > I tried commenting the /sbin/reboot part of the opensaf_reboot
> script,
> > but
> > eventually node rebooted? Does the controller node reboots the
> payload
> > node
> > when the node reboot is declared?
> >
> > Any help is much appreciated,
> >
> > --
> > Best Regards,
> > Santosh
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel
> Website,
> > sponsored by Intel and developed in partnership with Slashdot Media,
> > is your
> > hub for all things parallel software development, from weekly
> thought
> > leadership blogs to news, videos, case studies, tutorials and more.
> > Take a
> > look and join the conversation now.
> > http://goparallel.sourceforge.net/
> > _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> 
> 
> 
> --
> 
> 
> 
> 
> Best Regards,
> 
> Santosh

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to