On 12/03/15 15:57, Martin Lucina wrote:
[email protected] said:
Right, but signals is something that should be attacked critically,
not just bend over and emulating the existing concept. This whole
work is pointless if the end result is what is known as an OS ;)
I do NOT want to be doing a full-blown signals implementation. What I do
want (and the problem I'm trying to solve) is to be able to shutdown an
application cleanly, using either "xl shutdown", ACPI poweroff, magic
packet or whatever.
Let me try to clarify what I mean by "critically". I don't think
emulating signals is the way to go, "full-blown" implementation or not.
Anecdote: if we take the original motivation for rump kernels ("kernel
driver development in userspace") and just figure out how to emulate the
facilities required by the kernel in userspace, we end up with a
usermode OS, not a rump kernel. While the usermode OS is a solution,
it's not the better solution out of the two; everyone is of course
welcome to disagree with my assertion, but if so, use specific arguments
against section 4.5 of my thesis.
Maybe emulating signals will pretend to solve one problem in one case,
and maybe there's some merit in that. However, as you should be well
aware of, the problem with emulating the real world is in the actual
implementation and the testing against actual code. I can say that if I
were to emulate signals, which I'm not sure I would, I'd probably start
down the path you describe. However, I can't really give any useful
insight because that would require me implementing and testing the whole
thing.
I'd still go for a "what is actually needed" approach first, definitely
looking at what mods mysql would require, and perhaps surveying
unikernel projects and teaming up with them for a common interface.
Sure, the concept of a clean shutdown without signals doesn't really fit
into the POSIX interface, but if nobody is willing to even imagine what
things actually should look like, there will never be progress.
The accepted way to cleanly shutdown a well-behaved POSIX application is to
send it SIGTERM, wait a while and possibly send it SIGKILL if it doesn't
get its act together.
Here's a naive way of how I think this could work. This is likely to be
full of race conditions or potential deadlocks, but I want to get it on the
table anyway:
1) We implement a sigaction() which allows setting a SIGTERM handler and
ignores any other operations.
2) (Xen-specific, but other platforms will work similarly). We reinstate
the mini-os shutdown thread from upstream. This waits on a xenbus watch for
the "xl shutdown" signal.
3) When we receive a shutdown signal, the following needs to happen, in
this order:
a) We run the application's SIGTERM handler if set, in the context of
the shutdown thread.
b) We unblock any application threads waiting inside the Rump Kernel,
with all calls returning EINTR.
This should be enough for a well-behaved application to figure out that
it's supposed to terminate.
Judging by the words "should" and "well-behaved", your confidence in the
proposed emulation being a solution is about the same level as mine.
4) (This needs to be added to normal _exit() / abort() / after callmain
handling anyway). We close all open file descriptors, and do a chdir("/"),
so that rumpconfig can cleanly unmount filesystems and down network
interfaces.
Is this possible? Specifically, step 3) b)?
Yes, 3b is possible, sort of: something similar was required for
supporting exec() by multithreaded remote clients. However, the last
time it came up on this list with ping6(1) using alarm() instead of the
timeout argument to poll(), our conclusion was that applications should
be adjusted instead of trying to bend over backwards and sideways to
emulate signals. Now, granted, that discussion did not include
addressing how to shut down long-running servers, so it does not
directly apply. Really, I don't really know what's required without
banging on it for a week or two, not something I can solve by writing
emails.
For 4, a rump kernel provides the notion of a process exiting. A
no-longer-existing process does not have file descriptors or cwd. This
facility is not a secret and documented e.g. in rump_lwproc(3).
Now, I really think you should start by implementing a working solution
for _at least_ one non-trivial program (e.g. mysql) before proposing a
generalization.
Separately, but related, as I have found with MySQL's normal shutdown
process, we will also need to implement pthread_kill(thread, SIGKILL).
-mato