I disabled the node reboot by setting the timeout to 0 as mentioned above.
I have only one node running and there is no fail over node configured. So
at the event when there is any persisting error at the node level, one of
the component in SU detects it, then the reboot escalation cycle goes up
until the node reboot and is prevented as I have set the timeout to 0. Now
SU is in a unsuable state refusing any further admin commands. All the
components in the SU are terminated.
Is the behavior correct.? Since the error is persistent, the intent was to
stop the continuous node reboot cycle to allow user to fix the problem and
restart the SU. But its not accepting any admin commands further.
*SU state:*
[root@node1 ~]# amf-state su all
safSu=node1.SU,safSg=node1.SU,safApp=AmberApp
safSu=node1.SU,safSg=node1.SU,safApp=AmberApp
saAmfSUAdminState=LOCKED(2)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
*/var/log/messages at node1:*
[root@node1 ~]# tail -f /var/log/messages
Feb 25 19:14:33 node1 osafamfnd[2232]: IN
'safComp=node1.comp1,safSu=node1.SU,safSg=node1.SU,safApp=TestApp' Presence
State TERMINATING => UNINSTANTIATED
Feb 25 19:14:33 node1 startSAScript: killproc retval 0
Feb 25 19:14:33 node1 startSAScript: killproc retval 0
Feb 25 19:14:33 node1 osafamfnd[2232]: IN
'safComp=node1.comp2,safSu=node1.SU,safSg=node1.SU,safApp=TestApp' Presence
State TERMINATING => UNINSTANTIATED
Feb 25 19:14:33 node1 osafamfnd[2232]: IN
'safComp=node1.comp3,safSu=node1.SU,safSg=node1.SU,safApp=TestApp' Presence
State TERMINATING => UNINSTANTIATED
Feb 25 19:14:33 node1 osafamfnd[2232]: NO Terminated all application
components
Feb 25 19:14:33 node1 osafamfnd[2232]: NO Informing director of node
fail-over
Feb 25 19:14:34 node1 osafamfnd[2232]: NO Received reboot order, ordering
reboot now!
Feb 25 19:14:34 node1 osafamfnd[2232]: Rebooting OpenSAF NodeId = 131855 EE
Name = , Reason: Received reboot order, OwnNodeId = 131855, SupervisionTime
= 0
Feb 25 19:14:34 node1 osafamfnd[2232]: node reboot failure: exit code 512
*/var/log/messages at controller node:*
Feb 25 20:25:42 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:43 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:44 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:45 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:46 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:47 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:48 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:49 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:50 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:51 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Feb 25 20:25:52 mgt-a osafamfd[2944]: WA Admin operation is already going
on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
Best regards,
Santosh
On Mon, Feb 9, 2015 at 2:06 AM, Mathivanan Naickan Palanivelu <
[email protected]> wrote:
> Hi Santosh,
>
> Yes, the reboot can be controlled by tuning the OPENSAF_REBOOT_TIMEOUT
> configuration
> attribute in /etc/opensaf/nid.conf:
>
> Set it to zero to disable reboot, i.e. export OPENSAF_REBOOT_TIMEOUT=0
>
> Mathi.
>
>
> ----- [email protected] wrote:
>
> > Hi,
> >
> > Can we control the node reboot behavior by changing the
> > opensaf_reboot
> > script at the node level perform some additional action instead of a
> > reboot?
> >
> > I tried commenting the /sbin/reboot part of the opensaf_reboot script,
> > but
> > eventually node rebooted? Does the controller node reboots the payload
> > node
> > when the node reboot is declared?
> >
> > Any help is much appreciated,
> >
> > --
> > Best Regards,
> > Santosh
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media,
> > is your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more.
> > Take a
> > look and join the conversation now.
> > http://goparallel.sourceforge.net/
> > _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
--
Best Regards,
Santosh
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users