Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

Michael Chapman Mon, 10 Jul 2017 05:34:50 -0700

On Mon, 10 Jul 2017, Lennart Poettering wrote:

On Mon, 10.07.17 21:51, Michael Chapman (m...@very.puzzling.org) wrote:

This all stems from my experiences with PulseAudio back in the day:
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
data structures are copied as they are, and that's a time bomb really:
consider one of our context objects is being used by one thread at the
moment another thread invokes fork(): the thread using the object is
busy making changes to the object, rearranging some datastructure (for
example, rehashing a hash table, because it hit its fill limit) and
suchlike. Now the fork() happens while it is doing that: the data
structure will be copied in its half-written, half-updated status quo,
and in the child process there's no thread that could finish what has
been started, and there's neither a way to rollback the changes that
are in progress.

[...]

Thanks, that really does clear things up.

It's a pity glibc doesn't provide an equivalent for pthread_atfork() outside
of the pthread library. Having a notification that a fork has just occurred
would allow us to do the PID caching ourselves.


Well, pthread_atfork() is probably more a source of problems than a solution
for them.

Mutexes and fork() do not mix well: if you have a thread that acquired
a mutex right before a fork() then it will cease to exist but the
mutex remains locked. Now, you could use pthread_atfork() to unlock
it, but that really works only in trivial cases, with trivial data
structures, and otherwise creates ABBA problems and similar. I mean,
mutexes are supposed to make pieces of code atomic from the outside
view: but if you duplicate a process without the thread it will appear
aborted to the outside, and that's quite far from "atomic"...

I understand that... which is why I was only talking about PID caching.That is, it could be used to avoid the getpid() calls.

Anyway, it's all moot as I don't think we'd want to use pthread_atfork inany systemd APIs -- I'm not sure if they all link to libpthread yetanyway.

Of course, there's still a problem with people calling the clone syscall
directly... but I think once people start doing that we have to trust them
to know what they're doing.


Yes: if you invoke clone() directly, you should really invoke execve()
too soon, and in the time between these two syscalls you should not
invoke getpid() and limit yourself to known safe calls.

Lennart


_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

Reply via email to