Been thinking about this a bit more - I actually think
it's okay to just not ignore (i.e. set SIG_IGN) for
SIGCHLD signals. If I understand correctly, the
SIGCHLD signal is the mechanism by which a
parent waitpid() call is appraised of a child status
change (such as exit), so if the parent ignores it,
it can't get such status changes, hence the errors
that are returned from waitpid(). I think all we
need to do is ensure that the ENM method scripts
are spawned from a separate thread to ensure that
the event-handling thread won't be stuck waiting forever for a
script to finish. We don't have that problem for SMF
enable/disable actions, since they just request the
enable/disable and don't run to completion.
Alan
Anurag S. Maskey wrote:
> A bit of investigation and I uncovered other things:
>
> After a new thread is spawned for the script, waitpid() is called (in
> nwamd/util.c:197). The return value is being ignored. I checked and
> the return value is -1 with errno set to ECHILD. Because of this,
> WEXITSTATUS() is non-zero. Thus, even though the script ran and
> exited successfully, start_child() was returning non-zero.
>
> The waitpid(2) man page says that if SIGCHLD is set to SIG_IGN,
> "waitpid() will fail and set errno to ECHILD". I, then, discovered
> that nwamd/main.c is setting SIGCHLD to SIG_IGN. What is the purpose
> of this? If SIGCHLD is not set, then waitpid() completes
> successfully, and WEXISTATUS() returns the actual return value from
> the script.
>
> So, the question is:
>
> Does it make sense to check for errno == ECHILD if waitpid() returns
> -1 and then return 0?
>
> pid_t ret;
> ...
> ret = waitpid(pid, &status, 0);
> if (ret == -1 && errno == ECHILD) {
> return (0);
> } else {
> if (WIFSIGNALED(status) || WIFSTOPPED(status)) {
> ...
> }
> }
>
> Is there a different workaround?
> Or, signal(SIGCHLD, SIG_IGN) is not needed for nwamd?
>
> Thanks,
> Anurag
>
>
>
> Anurag S. Maskey wrote:
>> Recall that last week, when we were testing enabling/disabling ENMs,
>> I came across the situation where the log was saying the script
>> "completed normally", but the return code was not 0 (it was 102 or 65
>> or something like that).
>>
>> I've done a bit of digging and testing and have come to the
>> conclusion that this return value of non-zero does not mean that the
>> script failed.
>>
>> bash-3.2# nwamcfg list enm myenm
>> ENM:myenm
>> activation-mode manual
>> stop "/usr/bin/echo stop"
>> enabled true
>> start "/var/tmp/enm-script"
>>
>> bash-3.2# cat /var/tmp/enm-script
>> #!/sbin/sh
>> echo "script run" > /var/tmp/echoed.txt
>>
>> Now, enable myenm
>>
>> bash-3.2# nwamadm enable myenm
>> Enabling enm 'myenm'
>>
>> check debug logs. says "completed normally: 65". 65 is supposed to
>> be the return value from the script (nwamd/util.c:143). nwamd checks
>> this return value and decides that the script failed.
>>
>> bash-3.2# tail -f /var/tmp/nwam.log
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 853634 daemon.debug] 6:
>> door_switch: activating myenm
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 545250 daemon.debug] 6:
>> enqueueing event 17 (ENABLE) for object (80b2ac8) myenm
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 432327 daemon.debug] 2:
>> dequeueing event of type 17 (ENABLE) for object myenm
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 394438 daemon.debug] 2:
>> (80b2ac8) myenm: running method for event 17 (ENABLE)
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 995178 daemon.debug] 2:
>> nwamd_enm_handle_enable_event: running script /var/tmp/enm-script
>> for ENM myenm
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 653470 daemon.info] 2:
>> '/usr/bin/ctrun /var/tmp/enm-script' completed normally: 65
>> Feb 9 15:55:55 unknown nwamd[123354]: [ID 531482 daemon.error] 2:
>> nwamd_enm_handle_enable_event: execution of '/var/tmp/enm-script'
>> failed for ENM myenm
>>
>> But, check if script actually ran:
>>
>> bash-3.2# cat /var/tmp/echoed.txt
>> script run
>>
>> It actually did.
>>
>> What's going on here?
>>
>> Anurag
>>
>> PS. On a better news front, if an enm has an fmri, the
>> enabling/disabling through nwamadm works spectacularly. :)
>> _______________________________________________
>> nwam-dev mailing list
>> nwam-dev at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/nwam-dev
>>
> _______________________________________________
> nwam-dev mailing list
> nwam-dev at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/nwam-dev