Hi Steve,
I've now updated the deserialisation to make an attempt on the downgrade
scenario. Note the comment in process.h though regarding the ordering of
ProcessType. I've also raised bug 1256985 whilst looking at this.
> + TEST_FEATURE ("full double-fork daemon test where parent waits for ultimate
> child");
>
> does not fit the expected structure of a test feature name. Suggest instead:
> TEST_FEATURE ("with daemon where parent waits for ultimate child
> before exiting");
Fixed.
>
> + assert0 (prctl (PR_SET_CHILD_SUBREAPER, 1));
>
> Upstart does not depend on PR_SET_CHILD_SUBREAPER being available on the
> running kernel - the test suite shouldn't either. The test needs to be
> skipped if this facility is unavailable, or it needs to be rewritten to avoid
> subreaper entirely.
Fixed.
>
> + /* Low byte is signal */
> + TEST_EQ (siginfo.si_status, SIGTRAP);
> +
> + /* Normally, the next byte would be the ptrace event,
> + * but upstart hasn't yet set PTRACE_O_TRACEEXEC, hence
> + * no ptrace event will be available.
> + */
> + TEST_EQ (((siginfo.si_status & ~0x7f)>>8), 0);
>
> The second test here is redundant with the first; if siginfo.si_status ==
> SIGTRAP, the high byte must be 0. Either the second test should be dropped,
> or, if you want to keep it for clarity, the first test should be modified to
> check siginfo.si_status & 0x7f.
Fixed.
>
> + TIMED_BLOCK (5) {
>
> I'm not thrilled about this TIMED_BLOCK() idea. Why should we not just use
> inotify?
To give some context here, it's worth pointing out that this test has been
through a few iterations. An earlier incarnation of it was using signals
entirely but that is problematic since:
- test_daemon needed to raise SIGSTOP on itself.
- test_daemon is being ptraced hence is also getting SIGSTOP's sent to it by
the kernel.
- This is a multi-process test so we need to guarantee behaviour given that
we can't know the order the kernel will schedule each process.
- We don't want to call nih_main_loop() as we cannot then control the test in
"single-step" increments (well, as close as we can get to that anyway :-).
Certainly inotify would be the obvious choice but for that fact that it would
add further code and complexity to the test since we're avoiding using
nih_main_loop().
TIMED_BLOCK offers a rather useful generic facility I think, particularly since
a common requirement for the tests is to perform some operation "within a
reasonable amount of time" and then fail. Yes, it's tricky to determine the
value of "reasonable", but we do need to guarantee a hard test failure, rather
than an indefinite hang.
Admittedly, using TIMED_BLOCK() with file_line_count() is rather grisly, but
even if we were to use inotify, we'd still need to check the line count on each
event as we need to know when a particular parent is in a particular state.
> For that matter, why is this pidfile needed at all - what is this meant to
> guard against? The test case already includes its own direct check of the
> pids with ptrace, so the only thing this pidfile does is make sure the daemon
> agrees with the test what the PIDs are. That seems unnecessary to me.
It's not immediately obvious, but that's not the whole reason for the
pidlist-file specificically: since each parent updates pidlist-file, once the
expected number of pids are in that file, we know that the parent has reached a
particular state. This goes back to the iterations previously mentioned - this
test is attemping as best it can to perform and end-to-end daemon test whilst
minimising deviation from how the actual init daemon would react. test_daemon
facility allows us to abuse it purely to satisfy this single requirement.
>
> + /* Wait for child to send itself SIGSTOP denoting its
> + * final resting state.
> + */
> + got = FALSE;
> + TIMED_BLOCK (5) {
> +
> + memset (&siginfo, 0, sizeof (siginfo));
> + ret = waitid (P_PID, final_pid, &siginfo, WSTOPPED | WNOWAIT | WUNTRACED);
> +
> + if (! ret && siginfo.si_code == CLD_STOPPED &&
> + siginfo.si_status == SIGSTOP) {
> + got = TRUE;
> + break;
> + }
> + }
>
> This section seems overly complicated. The child process could just call
> pause() instead, and let the test runner kill the process when it's done.
The child process cannot just call pause() since no notification of that occurs
via waitid(), hence the test wouldn't know when the final pid is ready. We
could use pidfile I guess, but a signal seemed like a superior option in this
particular scenario.
>
> + TIMED_BLOCK (5) {
> + nih_child_poll ();
> +
> + if (kill (final_pid, 0) == -1 && errno == ESRCH) {
> + got = TRUE;
> + break;
> + }
> + }
>
> Definitely no reason for this to be in a TIMED_BLOCK(). Or called at all.
> We've already received CLD_EXITED above, so the kill() should immediately
> return ESRCH. But again, this code all goes away if we just kill SIGKILL the
> final_pid, instead of including these checks that are unrelated to the
> purpose of the test case.
The test does this so that it knows definitively that once that block has
exited, the process has gone. Since nih_child_poll() is called repeatedly in
that block (as Upstart does via its main loop), we also know that the Job
associated with the final pid will have been updated. We then proceed to check
that the update was as expected. If we don't do this, there is a good chance
that the Job state will be incorrect and the test will fail intermittently as a
result.
>
> static void
> selfpipe_setup (void)
> {
> - static struct sigaction act;
> - int read_flags;
> - int write_flags;
> [...]
>
> This mixes formatting changes with functional changes, making it much harder
> to follow the history. Please split the format changes into a separate commit
> and rebase the functional changes on top.
>
Formatting changes undone.
--
https://code.launchpad.net/~jamesodhunt/upstart/bug-530779/+merge/197080
Your team Upstart Reviewers is subscribed to branch lp:upstart.
--
upstart-devel mailing list
[email protected]
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/upstart-devel