A bit of investigation and I uncovered other things:
After a new thread is spawned for the script, waitpid() is called (in
nwamd/util.c:197). The return value is being ignored. I checked and
the return value is -1 with errno set to ECHILD. Because of this,
WEXITSTATUS() is non-zero. Thus, even though the script ran and exited
successfully, start_child() was returning non-zero.
The waitpid(2) man page says that if SIGCHLD is set to SIG_IGN,
"waitpid() will fail and set errno to ECHILD". I, then, discovered that
nwamd/main.c is setting SIGCHLD to SIG_IGN. What is the purpose of
this? If SIGCHLD is not set, then waitpid() completes successfully, and
WEXISTATUS() returns the actual return value from the script.
So, the question is:
Does it make sense to check for errno == ECHILD if waitpid() returns -1
and then return 0?
pid_t ret;
...
ret = waitpid(pid, &status, 0);
if (ret == -1 && errno == ECHILD) {
return (0);
} else {
if (WIFSIGNALED(status) || WIFSTOPPED(status)) {
...
}
}
Is there a different workaround?
Or, signal(SIGCHLD, SIG_IGN) is not needed for nwamd?
Thanks,
Anurag
Anurag S. Maskey wrote:
> Recall that last week, when we were testing enabling/disabling ENMs, I
> came across the situation where the log was saying the script "completed
> normally", but the return code was not 0 (it was 102 or 65 or something
> like that).
>
> I've done a bit of digging and testing and have come to the conclusion
> that this return value of non-zero does not mean that the script failed.
>
> bash-3.2# nwamcfg list enm myenm
> ENM:myenm
> activation-mode manual
> stop "/usr/bin/echo stop"
> enabled true
> start "/var/tmp/enm-script"
>
> bash-3.2# cat /var/tmp/enm-script
> #!/sbin/sh
> echo "script run" > /var/tmp/echoed.txt
>
> Now, enable myenm
>
> bash-3.2# nwamadm enable myenm
> Enabling enm 'myenm'
>
> check debug logs. says "completed normally: 65". 65 is supposed to be
> the return value from the script (nwamd/util.c:143). nwamd checks this
> return value and decides that the script failed.
>
> bash-3.2# tail -f /var/tmp/nwam.log
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 853634 daemon.debug] 6:
> door_switch: activating myenm
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 545250 daemon.debug] 6:
> enqueueing event 17 (ENABLE) for object (80b2ac8) myenm
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 432327 daemon.debug] 2:
> dequeueing event of type 17 (ENABLE) for object myenm
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 394438 daemon.debug] 2:
> (80b2ac8) myenm: running method for event 17 (ENABLE)
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 995178 daemon.debug] 2:
> nwamd_enm_handle_enable_event: running script /var/tmp/enm-script
> for ENM myenm
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 653470 daemon.info] 2:
> '/usr/bin/ctrun /var/tmp/enm-script' completed normally: 65
> Feb 9 15:55:55 unknown nwamd[123354]: [ID 531482 daemon.error] 2:
> nwamd_enm_handle_enable_event: execution of '/var/tmp/enm-script'
> failed for ENM myenm
>
> But, check if script actually ran:
>
> bash-3.2# cat /var/tmp/echoed.txt
> script run
>
> It actually did.
>
> What's going on here?
>
> Anurag
>
> PS. On a better news front, if an enm has an fmri, the
> enabling/disabling through nwamadm works spectacularly. :)
> _______________________________________________
> nwam-dev mailing list
> nwam-dev at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/nwam-dev
>