Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Ping^2. On Tue, Oct 28, 2014 at 6:17 PM, Nathaniel Smith n...@pobox.com wrote: Ping. On 19 Oct 2014 23:44, Nathaniel Smith n...@pobox.com wrote: Hi Jakub, Thanks for your feedback! See below. On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote: Got total silence the last 4 times I posted this, and users have been bugging me about it offline, so trying again. This patch fixes a showstopper problem preventing the transparent use of OpenMP in scientific libraries, esp. with Python. Specifically, it is currently not possible to use GNU OpenMP -- even in a limited, temporary manner -- in any program that uses (or might use) fork() for parallelism, even if the fork() and the use of OpenMP occur at totally different times. This limitation is unique to GNU OpenMP -- every competing OpenMP implementation already contains something like this patch. While technically not fully POSIX-compliant (because POSIX gives much much weaker guarantees around fork() than any real Unix), the approach used in this patch (a) performs only POSIX-compliant operations when the host program is itself fully POSIX-compliant, and (b) actually works perfectly reliably in practice on all commonly used platforms I'm aware of. 1) gomp_we_are_forked in your patch will attempt to free the pool of the thread that encounters it, which is racy; consider a program after fork calling pthread_create several times, each thread thusly created then ~ at the same time doing #pragma omp parallel and the initial thread too. You really should clean up the pool data structure only in the initial thread and nowhere else; for native TLS (non-emulated, IE model) the best would be to have a flag in the gomp_thread_pool structure, struct gomp_thread *thr = gomp_thread (); if (thr thr-thread_pool) thr-thread_pool-after_fork = true; should in that case be safe in the atfork child handler. For !HAVE_TLS or emulated TLS not sure if it is completely safe, it would call pthread_getspecific. Perhaps just don't register atfork handler on those targets at all? Good point. The updated patch below takes a slightly different approach. I moved we_are_forked to the per-thread struct, and then I moved the setting of it into the *parent* process's fork handlers -- the before-fork handler toggles it to true, then the child spawns off and inherits this setting, and then the parent after-fork handler toggles it back again. (Since it's per-thread, there's no race condition here.) This lets us remove the child after-fork handler entirely, and -- since the parent handlers aren't subject to any restrictions on what they can call -- it works on all platforms regardless of the TLS implementation. 2) can you explain why are you removing the cleanups from gomp_free_pool_helper ? They aren't removed, but rather moved from the helper function (which runs in the helper threads) into gomp_free_thread_pool (which runs in the main thread) -- which makes it easier to run the appropriate cleanups even in the case where the helper threads aren't running. (But see below -- we might prefer to drop this part of the patch entirely.) 3) you can call pthread_atfork many times (once for each pthread that creates a thread pool), that is undesirable, you want to do that only if the initial thread creates thread pool Good point. I've moved the pthread_atfork call to initialize_team, which is an __attribute__((constructor)). I am a little uncertain whether this is the best approach, though, because of the comment in team_destructor about wanting to correctly handle dlopen/dlclose. One of pthread_atfork's many (many) limitations is that there's no way to unregister handlers, so if dlopen/dlclose is important (is it?) then we can't call pthread_atfork from initialize_team. If this is a problem, then we could delay the pthread_atfork until e.g. the first thread pool is spawned -- would this be preferred? 4) the testcase is clearly not portable enough, should be probably limited to *-*-linux* only, fork etc. will likely not work on many targets. I think it should work on pretty much any target that has fork(); we definitely care about having this functionality on e.g. OS X. I've added some genericish target specifications. In any case, even with the patch, are you aware that you'll leak megabytes of thread stacks etc.? Well, err, I wasn't, no :-). Thanks for pointing that out. To me this does clinch the argument that a better approach would be the one I suggested in https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html i.e., of tracking whether any threadprivate variables were present, and if not then simply shutting down the thread pools before forking. But this would be a much more invasive change to gomp (I wouldn't
Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Hi Jakub, Thanks for your feedback! See below. On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote: Got total silence the last 4 times I posted this, and users have been bugging me about it offline, so trying again. This patch fixes a showstopper problem preventing the transparent use of OpenMP in scientific libraries, esp. with Python. Specifically, it is currently not possible to use GNU OpenMP -- even in a limited, temporary manner -- in any program that uses (or might use) fork() for parallelism, even if the fork() and the use of OpenMP occur at totally different times. This limitation is unique to GNU OpenMP -- every competing OpenMP implementation already contains something like this patch. While technically not fully POSIX-compliant (because POSIX gives much much weaker guarantees around fork() than any real Unix), the approach used in this patch (a) performs only POSIX-compliant operations when the host program is itself fully POSIX-compliant, and (b) actually works perfectly reliably in practice on all commonly used platforms I'm aware of. 1) gomp_we_are_forked in your patch will attempt to free the pool of the thread that encounters it, which is racy; consider a program after fork calling pthread_create several times, each thread thusly created then ~ at the same time doing #pragma omp parallel and the initial thread too. You really should clean up the pool data structure only in the initial thread and nowhere else; for native TLS (non-emulated, IE model) the best would be to have a flag in the gomp_thread_pool structure, struct gomp_thread *thr = gomp_thread (); if (thr thr-thread_pool) thr-thread_pool-after_fork = true; should in that case be safe in the atfork child handler. For !HAVE_TLS or emulated TLS not sure if it is completely safe, it would call pthread_getspecific. Perhaps just don't register atfork handler on those targets at all? Good point. The updated patch below takes a slightly different approach. I moved we_are_forked to the per-thread struct, and then I moved the setting of it into the *parent* process's fork handlers -- the before-fork handler toggles it to true, then the child spawns off and inherits this setting, and then the parent after-fork handler toggles it back again. (Since it's per-thread, there's no race condition here.) This lets us remove the child after-fork handler entirely, and -- since the parent handlers aren't subject to any restrictions on what they can call -- it works on all platforms regardless of the TLS implementation. 2) can you explain why are you removing the cleanups from gomp_free_pool_helper ? They aren't removed, but rather moved from the helper function (which runs in the helper threads) into gomp_free_thread_pool (which runs in the main thread) -- which makes it easier to run the appropriate cleanups even in the case where the helper threads aren't running. (But see below -- we might prefer to drop this part of the patch entirely.) 3) you can call pthread_atfork many times (once for each pthread that creates a thread pool), that is undesirable, you want to do that only if the initial thread creates thread pool Good point. I've moved the pthread_atfork call to initialize_team, which is an __attribute__((constructor)). I am a little uncertain whether this is the best approach, though, because of the comment in team_destructor about wanting to correctly handle dlopen/dlclose. One of pthread_atfork's many (many) limitations is that there's no way to unregister handlers, so if dlopen/dlclose is important (is it?) then we can't call pthread_atfork from initialize_team. If this is a problem, then we could delay the pthread_atfork until e.g. the first thread pool is spawned -- would this be preferred? 4) the testcase is clearly not portable enough, should be probably limited to *-*-linux* only, fork etc. will likely not work on many targets. I think it should work on pretty much any target that has fork(); we definitely care about having this functionality on e.g. OS X. I've added some genericish target specifications. In any case, even with the patch, are you aware that you'll leak megabytes of thread stacks etc.? Well, err, I wasn't, no :-). Thanks for pointing that out. To me this does clinch the argument that a better approach would be the one I suggested in https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html i.e., of tracking whether any threadprivate variables were present, and if not then simply shutting down the thread pools before forking. But this would be a much more invasive change to gomp (I wouldn't know where to start). In the mean time, the current patch is still worthwhile. The cost is not that bad: I wouldn't think of it as leaking so much as overhead of supporting OMP-fork-OMP. No-one forks a child which forks a child which
Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote: Got total silence the last 4 times I posted this, and users have been bugging me about it offline, so trying again. This patch fixes a showstopper problem preventing the transparent use of OpenMP in scientific libraries, esp. with Python. Specifically, it is currently not possible to use GNU OpenMP -- even in a limited, temporary manner -- in any program that uses (or might use) fork() for parallelism, even if the fork() and the use of OpenMP occur at totally different times. This limitation is unique to GNU OpenMP -- every competing OpenMP implementation already contains something like this patch. While technically not fully POSIX-compliant (because POSIX gives much much weaker guarantees around fork() than any real Unix), the approach used in this patch (a) performs only POSIX-compliant operations when the host program is itself fully POSIX-compliant, and (b) actually works perfectly reliably in practice on all commonly used platforms I'm aware of. 1) gomp_we_are_forked in your patch will attempt to free the pool of the thread that encounters it, which is racy; consider a program after fork calling pthread_create several times, each thread thusly created then ~ at the same time doing #pragma omp parallel and the initial thread too. You really should clean up the pool data structure only in the initial thread and nowhere else; for native TLS (non-emulated, IE model) the best would be to have a flag in the gomp_thread_pool structure, struct gomp_thread *thr = gomp_thread (); if (thr thr-thread_pool) thr-thread_pool-after_fork = true; should in that case be safe in the atfork child handler. For !HAVE_TLS or emulated TLS not sure if it is completely safe, it would call pthread_getspecific. Perhaps just don't register atfork handler on those targets at all? 2) can you explain why are you removing the cleanups from gomp_free_pool_helper ? 3) you can call pthread_atfork many times (once for each pthread that creates a thread pool), that is undesirable, you want to do that only if the initial thread creates thread pool 4) the testcase is clearly not portable enough, should be probably limited to *-*-linux* only, fork etc. will likely not work on many targets. In any case, even with the patch, are you aware that you'll leak megabytes of thread stacks etc.? Jakub
ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Hi all, Got total silence the last 4 times I posted this, and users have been bugging me about it offline, so trying again. This patch fixes a showstopper problem preventing the transparent use of OpenMP in scientific libraries, esp. with Python. Specifically, it is currently not possible to use GNU OpenMP -- even in a limited, temporary manner -- in any program that uses (or might use) fork() for parallelism, even if the fork() and the use of OpenMP occur at totally different times. This limitation is unique to GNU OpenMP -- every competing OpenMP implementation already contains something like this patch. While technically not fully POSIX-compliant (because POSIX gives much much weaker guarantees around fork() than any real Unix), the approach used in this patch (a) performs only POSIX-compliant operations when the host program is itself fully POSIX-compliant, and (b) actually works perfectly reliably in practice on all commonly used platforms I'm aware of. Tested on linux x86-64. I do not have write access to the SVN repo, so looking for someone to do the commit. Previous discussion/review: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html Bugzilla entry: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. * testsuite/libgomp.c/fork-1.c: New test. Thanks, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: team.c === --- team.c (revision 207398) +++ team.c (working copy) @@ -28,6 +28,7 @@ #include libgomp.h #include stdlib.h #include string.h +#include stdbool.h /* This attribute contains PTHREAD_CREATE_DETACHED. */ pthread_attr_t gomp_thread_attr; @@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data; pthread_key_t gomp_tls_key; #endif +/* This is to enable best-effort cleanup after fork. */ +static bool gomp_we_are_forked; /* This structure is used to communicate across pthread_create. */ @@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po return pool; } +/* Free a thread pool and release its threads. */ + static void gomp_free_pool_helper (void *thread_pool) { - struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (pool-threads_dock); - gomp_sem_destroy (thr-release); - thr-thread_pool = NULL; - thr-task = NULL; pthread_exit (NULL); } -/* Free a thread pool and release its threads. */ - -void -gomp_free_thread (void *arg __attribute__((unused))) +static void +gomp_free_thread_pool (bool threads_are_running) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; if (pool) { + int i; if (pool-threads_used 0) { - int i; - for (i = 1; i pool-threads_used; i++) + if (threads_are_running) { - struct gomp_thread *nthr = pool-threads[i]; - nthr-fn = gomp_free_pool_helper; - nthr-data = pool; + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; + nthr-fn = gomp_free_pool_helper; + nthr-data = pool; + } + /* This barrier undocks threads docked on pool-threads_dock. */ + gomp_barrier_wait (pool-threads_dock); + /* And this waits till all threads have called +gomp_barrier_wait_last in gomp_free_pool_helper. */ + gomp_barrier_wait (pool-threads_dock); } - /* This barrier undocks threads docked on pool-threads_dock. */ - gomp_barrier_wait (pool-threads_dock); - /* And this waits till all threads have called gomp_barrier_wait_last -in gomp_free_pool_helper. */ - gomp_barrier_wait (pool-threads_dock); /* Now it is safe to destroy the barrier and free the pool. */ gomp_barrier_destroy (pool-threads_dock); @@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool) gomp_managed_threads -= pool-threads_used - 1L; gomp_mutex_unlock (gomp_managed_threads_lock); #endif + /* Clean up thread objects */ + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; +
Ping x 6: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Hi all, Ping. Again, this patch fixes a limitation in GOMP which makes it impossible for programmers who care about gcc compatibility to safely use OpenMP in libraries; GOMP is the last OpenMP implementation with this limitation. -n On Wed, May 14, 2014 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Pinging again about the patch below. The lack of this patch is essentially a blocker to using gcc+python+openmp together, which is a shame, since python is increasingly important in numerical computing, openmp is pretty useful, and gcc is the only openmp implementation that does not support this functionality. -n On Tue, Apr 15, 2014 at 1:19 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote: On 02/16/2014 03:59 PM, Nathaniel Smith wrote: Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). Point well taken. Hi all, I guess this patch has gotten all the feedback that it's getting. Any interest in committing it? :-) I don't have commit access. 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. Pinging this again now that trunk has re-opened. For compliant code this patch has essentially no impact (OMP-using code acquires a single-line post-fork callback which sets a flag; everything else works the same as now). For technically non-compliant mostly serial code that uses OMP in some places, and forks children in other places, it makes a best effort attempt to clean up the thread pool detritus left by a fork, instead of simply deadlocking as currently, so as to allow children to use OMP as well. This makes GOMP match the behaviour of all other OMP implementations I'm aware of. Previous discussion: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html Bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 I don't have a commit bit -- please commit if acceptable. Cheers, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: team.c === --- team.c (revision 207398) +++ team.c (working copy) @@ -28,6 +28,7 @@ #include libgomp.h #include stdlib.h #include string.h +#include stdbool.h /* This attribute contains PTHREAD_CREATE_DETACHED. */ pthread_attr_t gomp_thread_attr; @@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data; pthread_key_t gomp_tls_key; #endif +/* This is to enable best-effort cleanup after fork. */ +static bool gomp_we_are_forked; /* This structure is used to communicate across pthread_create. */ @@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po return pool; } +/* Free a thread pool and release its threads. */ + static void gomp_free_pool_helper (void *thread_pool) { - struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (pool-threads_dock); - gomp_sem_destroy (thr-release); - thr-thread_pool = NULL; - thr-task = NULL; pthread_exit (NULL); } -/* Free a thread pool and release its threads. */ - -void -gomp_free_thread (void *arg __attribute__((unused))) +static void +gomp_free_thread_pool (bool threads_are_running) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; if (pool) { + int i; if (pool-threads_used 0) { - int i; - for (i = 1; i pool-threads_used; i++) + if (threads_are_running) { - struct gomp_thread *nthr = pool-threads[i]; - nthr-fn = gomp_free_pool_helper; - nthr-data = pool; + for (i = 1; i pool-threads_used; i++) + { +
Ping x 5: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Hi all, Pinging again about the patch below. The lack of this patch is essentially a blocker to using gcc+python+openmp together, which is a shame, since python is increasingly important in numerical computing, openmp is pretty useful, and gcc is the only openmp implementation that does not support this functionality. -n On Tue, Apr 15, 2014 at 1:19 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote: On 02/16/2014 03:59 PM, Nathaniel Smith wrote: Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). Point well taken. Hi all, I guess this patch has gotten all the feedback that it's getting. Any interest in committing it? :-) I don't have commit access. 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. Pinging this again now that trunk has re-opened. For compliant code this patch has essentially no impact (OMP-using code acquires a single-line post-fork callback which sets a flag; everything else works the same as now). For technically non-compliant mostly serial code that uses OMP in some places, and forks children in other places, it makes a best effort attempt to clean up the thread pool detritus left by a fork, instead of simply deadlocking as currently, so as to allow children to use OMP as well. This makes GOMP match the behaviour of all other OMP implementations I'm aware of. Previous discussion: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html Bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 I don't have a commit bit -- please commit if acceptable. Cheers, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: team.c === --- team.c (revision 207398) +++ team.c (working copy) @@ -28,6 +28,7 @@ #include libgomp.h #include stdlib.h #include string.h +#include stdbool.h /* This attribute contains PTHREAD_CREATE_DETACHED. */ pthread_attr_t gomp_thread_attr; @@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data; pthread_key_t gomp_tls_key; #endif +/* This is to enable best-effort cleanup after fork. */ +static bool gomp_we_are_forked; /* This structure is used to communicate across pthread_create. */ @@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po return pool; } +/* Free a thread pool and release its threads. */ + static void gomp_free_pool_helper (void *thread_pool) { - struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (pool-threads_dock); - gomp_sem_destroy (thr-release); - thr-thread_pool = NULL; - thr-task = NULL; pthread_exit (NULL); } -/* Free a thread pool and release its threads. */ - -void -gomp_free_thread (void *arg __attribute__((unused))) +static void +gomp_free_thread_pool (bool threads_are_running) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; if (pool) { + int i; if (pool-threads_used 0) { - int i; - for (i = 1; i pool-threads_used; i++) + if (threads_are_running) { - struct gomp_thread *nthr = pool-threads[i]; - nthr-fn = gomp_free_pool_helper; - nthr-data = pool; + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; + nthr-fn = gomp_free_pool_helper; + nthr-data = pool; + } + /* This barrier undocks threads docked on pool-threads_dock. */ + gomp_barrier_wait (pool-threads_dock); + /* And this waits till all threads have called +gomp_barrier_wait_last in gomp_free_pool_helper. */ +
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote: On 02/16/2014 03:59 PM, Nathaniel Smith wrote: Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). Point well taken. Hi all, I guess this patch has gotten all the feedback that it's getting. Any interest in committing it? :-) I don't have commit access. 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. Pinging this again now that trunk has re-opened. For compliant code this patch has essentially no impact (OMP-using code acquires a single-line post-fork callback which sets a flag; everything else works the same as now). For technically non-compliant mostly serial code that uses OMP in some places, and forks children in other places, it makes a best effort attempt to clean up the thread pool detritus left by a fork, instead of simply deadlocking as currently, so as to allow children to use OMP as well. This makes GOMP match the behaviour of all other OMP implementations I'm aware of. Previous discussion: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html Bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 I don't have a commit bit -- please commit if acceptable. Cheers, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote: On 02/16/2014 03:59 PM, Nathaniel Smith wrote: Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). Point well taken. Hi all, I guess this patch has gotten all the feedback that it's getting. Any interest in committing it? :-) I don't have commit access. 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. Cheers, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: team.c === --- team.c (revision 207398) +++ team.c (working copy) @@ -28,6 +28,7 @@ #include libgomp.h #include stdlib.h #include string.h +#include stdbool.h /* This attribute contains PTHREAD_CREATE_DETACHED. */ pthread_attr_t gomp_thread_attr; @@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data; pthread_key_t gomp_tls_key; #endif +/* This is to enable best-effort cleanup after fork. */ +static bool gomp_we_are_forked; /* This structure is used to communicate across pthread_create. */ @@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po return pool; } +/* Free a thread pool and release its threads. */ + static void gomp_free_pool_helper (void *thread_pool) { - struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (pool-threads_dock); - gomp_sem_destroy (thr-release); - thr-thread_pool = NULL; - thr-task = NULL; pthread_exit (NULL); } -/* Free a thread pool and release its threads. */ - -void -gomp_free_thread (void *arg __attribute__((unused))) +static void +gomp_free_thread_pool (bool threads_are_running) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; if (pool) { + int i; if (pool-threads_used 0) { - int i; - for (i = 1; i pool-threads_used; i++) + if (threads_are_running) { - struct gomp_thread *nthr = pool-threads[i]; - nthr-fn = gomp_free_pool_helper; - nthr-data = pool; + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; + nthr-fn = gomp_free_pool_helper; + nthr-data = pool; + } + /* This barrier undocks threads docked on pool-threads_dock. */ + gomp_barrier_wait (pool-threads_dock); + /* And this waits till all threads have called +gomp_barrier_wait_last in gomp_free_pool_helper. */ + gomp_barrier_wait (pool-threads_dock); } - /* This barrier undocks threads docked on pool-threads_dock. */ - gomp_barrier_wait (pool-threads_dock); - /* And this waits till all threads have called gomp_barrier_wait_last -in gomp_free_pool_helper. */ - gomp_barrier_wait (pool-threads_dock); /* Now it is safe to destroy the barrier and free the pool. */ gomp_barrier_destroy (pool-threads_dock); @@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool) gomp_managed_threads -= pool-threads_used - 1L; gomp_mutex_unlock (gomp_managed_threads_lock); #endif + /* Clean up thread objects */ + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; + gomp_sem_destroy (nthr-release); + nthr-thread_pool = NULL; + nthr-task = NULL; + } } free (pool-threads); if (pool-last_team) @@ -266,6 +276,58 @@ gomp_free_pool_helper (void *thread_pool) } } +/* This is called whenever a thread exits which has a non-NULL value for + gomp_thread_destructor. In practice, the only thread for which this occurs + is the one which created the thread pool. +*/ +void +gomp_free_thread (void *arg __attribute__((unused))) +{ + gomp_free_thread_pool (true);
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On 02/16/2014 03:59 PM, Nathaniel Smith wrote: Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). Point well taken. r~
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Fri, Feb 14, 2014 at 3:14 PM, Richard Henderson r...@redhat.com wrote: On 02/14/2014 12:21 AM, Jakub Jelinek wrote: Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback directly? I see no restrictions on what kind of code is allowed to execute during that callback. Well, fork is async signal safe function, so calling malloc/free, or any kind of synchronization primitives is completely unsafe there. That's as may be, but even the opengroup's rationale for pthread_atfork mentions using locks in the three callbacks. I strongly suspect that no real use of pthread_atfork can ever really be async safe. Yes, but the problem is that depending on what the user intends to do after forking, our pthread_atfork handler might help or it might hurt, and we don't know which. Consider these two cases: - fork+exec - fork+continue to use OMP in child The former case is totally POSIX-legal, even when performed at arbitrary places, even when another thread is, say, in the middle of calling malloc(). If we register a pthread_atfork handler which calls non-signal-safe functions, then we risk breaking POSIX-legal programs like this. The latter case is broken in current GOMP, but we would like it to work as well -- at least when possible. So the way the patch is structured the way it is, is to ensure that we have minimal impact on the former case while still giving the latter case a chance to succeed. Updated patch addressing your other comments attached. 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: team.c === --- team.c (revision 207398) +++ team.c (working copy) @@ -28,6 +28,7 @@ #include libgomp.h #include stdlib.h #include string.h +#include stdbool.h /* This attribute contains PTHREAD_CREATE_DETACHED. */ pthread_attr_t gomp_thread_attr; @@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data; pthread_key_t gomp_tls_key; #endif +/* This is to enable best-effort cleanup after fork. */ +static bool gomp_we_are_forked; /* This structure is used to communicate across pthread_create. */ @@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po return pool; } +/* Free a thread pool and release its threads. */ + static void gomp_free_pool_helper (void *thread_pool) { - struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (pool-threads_dock); - gomp_sem_destroy (thr-release); - thr-thread_pool = NULL; - thr-task = NULL; pthread_exit (NULL); } -/* Free a thread pool and release its threads. */ - -void -gomp_free_thread (void *arg __attribute__((unused))) +static void +gomp_free_thread_pool (bool threads_are_running) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; if (pool) { + int i; if (pool-threads_used 0) { - int i; - for (i = 1; i pool-threads_used; i++) + if (threads_are_running) { - struct gomp_thread *nthr = pool-threads[i]; - nthr-fn = gomp_free_pool_helper; - nthr-data = pool; + for (i = 1; i pool-threads_used; i++) + { + struct gomp_thread *nthr = pool-threads[i]; + nthr-fn = gomp_free_pool_helper; + nthr-data = pool; + } + /* This barrier undocks threads docked on pool-threads_dock. */ + gomp_barrier_wait (pool-threads_dock); + /* And this waits till all threads have called +gomp_barrier_wait_last in gomp_free_pool_helper. */ + gomp_barrier_wait (pool-threads_dock); } - /* This barrier undocks threads docked on pool-threads_dock. */ - gomp_barrier_wait (pool-threads_dock); - /* And this waits till all threads have called gomp_barrier_wait_last -in gomp_free_pool_helper. */ - gomp_barrier_wait (pool-threads_dock); /* Now it is safe to destroy the barrier and free the pool. */ gomp_barrier_destroy (pool-threads_dock); @@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool) gomp_managed_threads -= pool-threads_used - 1L; gomp_mutex_unlock
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Fri, Feb 14, 2014 at 3:43 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Feb 14, 2014 at 09:21:24AM +0100, Jakub Jelinek wrote: Well, fork is async signal safe function, so calling malloc/free, or any kind of synchronization primitives is completely unsafe there. The only safe thing could be to atomically or in some global flag (or set some TLS flag?) and deal with the freeing next time you encounter omp parallel. But, the state of the old thread pool may be in some inconsistent shape. BTW, I think far cleaner solution would be to discuss on Omp-lang and add some standard omp_* function which would allow to throw away all the cached OpenMP threads, after calling that function one could not assume threadprivate vars (other than in the initial thread) preserve their values. If this function would be only allowed outside of the parallel region (i.e. if omp_in_parallel () == 0, or even just if omp_get_level () == 0) and pretend to do #pragma omp parallel num_threads (1) ; i.e. something after which it isn't guaranteed to preserve threadprivate vars, then the library could perform this at the point where it is safe to do so (of course it wouldn't be async-signal-safe function) and isn't a performance issue (calling it when you are expecting to soon launch another #pragma omp parallel could of course slow things down a lot). Anything else is going to be either unsafe, or leak memory. I think the core problem here is that it's not possible to hide OMP usage inside an interface boundary, so you can't reliably compose larger programs out of pieces that individually may or may not use OMP. And unfortunately, I don't think your proposed omp_forget_threadprivates() function helps solve that problem. Like, consider the case of DGEMM implemented using OMP. Right now, this means that calling DGEMM will break fork(). If DGEMM instead called omp_forget_threadprivates(), then fork() would work. BUT, the program might contain other code -- perhaps in a different library -- which also used threadprivates. And every time someone calls DGEMM, this other code would find that their threadprivates had mysteriously disappeared. (Or might find this, depending on whether fork() was called in between, etc.) Whether it would be safe to call omp_forget_threadprivates() is a global property of the whole program, not something that can be determined by looking at any one piece in isolation. What *would* work is if GOMP started tracking whether there were any threadprivate variables in existence. Then it would be possible to dynamically determine at runtime whether this global property held, and automatically clean up threads at pre-fork-time iff it was globally safe to do so. Then we could document that *if* you want to use OMP inside a composable library, then you have to avoid threadprivates, and everything would just work. (In this particular case, it is true that OpenBLAS already doesn't use any threadprivates...) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Thu, Feb 13, 2014 at 01:22:41PM -0800, Richard Henderson wrote: +/* This is to enable best-effort cleanup after fork. */ +static int gomp_we_are_forked = 0; bool, no explicit initialization, possible removal, see below. +static void +gomp_free_thread_pool (int threads_running) bool for threads_running. It looks like a count otherwise. +gomp_after_fork_callback () (void) + pthread_atfork (NULL, NULL, gomp_after_fork_callback); not needed. Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback directly? I see no restrictions on what kind of code is allowed to execute during that callback. Well, fork is async signal safe function, so calling malloc/free, or any kind of synchronization primitives is completely unsafe there. The only safe thing could be to atomically or in some global flag (or set some TLS flag?) and deal with the freeing next time you encounter omp parallel. But, the state of the old thread pool may be in some inconsistent shape. Jakub
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On Fri, Feb 14, 2014 at 09:21:24AM +0100, Jakub Jelinek wrote: Well, fork is async signal safe function, so calling malloc/free, or any kind of synchronization primitives is completely unsafe there. The only safe thing could be to atomically or in some global flag (or set some TLS flag?) and deal with the freeing next time you encounter omp parallel. But, the state of the old thread pool may be in some inconsistent shape. BTW, I think far cleaner solution would be to discuss on Omp-lang and add some standard omp_* function which would allow to throw away all the cached OpenMP threads, after calling that function one could not assume threadprivate vars (other than in the initial thread) preserve their values. If this function would be only allowed outside of the parallel region (i.e. if omp_in_parallel () == 0, or even just if omp_get_level () == 0) and pretend to do #pragma omp parallel num_threads (1) ; i.e. something after which it isn't guaranteed to preserve threadprivate vars, then the library could perform this at the point where it is safe to do so (of course it wouldn't be async-signal-safe function) and isn't a performance issue (calling it when you are expecting to soon launch another #pragma omp parallel could of course slow things down a lot). Anything else is going to be either unsafe, or leak memory. Jakub
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
On 02/14/2014 12:21 AM, Jakub Jelinek wrote: Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback directly? I see no restrictions on what kind of code is allowed to execute during that callback. Well, fork is async signal safe function, so calling malloc/free, or any kind of synchronization primitives is completely unsafe there. That's as may be, but even the opengroup's rationale for pthread_atfork mentions using locks in the three callbacks. I strongly suspect that no real use of pthread_atfork can ever really be async safe. r~
Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
+/* This is to enable best-effort cleanup after fork. */ +static int gomp_we_are_forked = 0; bool, no explicit initialization, possible removal, see below. +static void +gomp_free_thread_pool (int threads_running) bool for threads_running. It looks like a count otherwise. +gomp_after_fork_callback () (void) + pthread_atfork (NULL, NULL, gomp_after_fork_callback); not needed. Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback directly? I see no restrictions on what kind of code is allowed to execute during that callback. r~
[PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Problem: A common use care for OMP is to accelerate the internal workings of an otherwise serial interface. For example, OpenBLAS in some settings will internally use OMP to accelerate the implementation of matrix-matrix multiply (DGEMM). When DGEMM is called, then an OMP section is started, the work is done, then the OMP section exits, the program returns to serial mode, and DGEMM returns. All this is entirely transparent to the user -- in fact, it's common for users to switch between different linear algebra cores (BLAS libraries) without recompiling, so it's impossible for code that uses linear algebra to know which underlying library is in use, or how it has been compiled. However, in order to support some corners of the OMP spec, it is important that the threads that were started to implement an OMP parallel section be kept around, in case another OMP section has started. (AFAICT this is only true when threadprivate variables are in use. Unfortunately AFAICT there is currently no way to determine whether this is the case -- such variables are handled directly by GCC without calling into libgomp, so we can't tell at runtime whether they exist.) And, this causes a big problem and abstraction leak: it means that if you use OMP (e.g., by multiplying two matrices), and then fork, and then the child also uses OMP (e.g., by also multiplying two matrices), then the child immediately deadlocks (as OMP waits for threads that it thinks still exist, but that disappeared during the fork). The result is that it simply *is not possible to know* whether fork() will actually work as advertised, even when writing purely serial code, if that code happens to do seemingly innocent things like linear algebra. And this then ends up causing surprising wreakage in far-flung parts of the numerical ecosystem (e.g., here's someone trying to figure figure out why their web site's task manager crashes whenever they try to plot a graph: https://github.com/celery/celery/issues/1842). (Somewhat more impassioned rant and references to previous discussions here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035) In practice, GOMP seems to be the only OMP implementation that suffers from this problem; people who encounter this problem are often advised to switch to icc. There does not appear to be any fully POSIX-compliant way to solve this problem (not least because in a strict reading of the POSIX spec, you aren't guaranteed to be able to do practically *anything* after a fork() in any program which has ever called a pthreads_* function). In a less strict reading, we might expect to be okay if no threads are actually running at the time that fork() is called -- but, we can't shut down OMP threads before forking, because of the issue with threadprivate variables -- it might change the behaviour of compliant programs. But in practice, if the fork() occurs at a time when every thread is just sitting waiting on a barrier, then we can be pretty sure that libc etc. will be in a generally thread-consistent state. And in practice, the few truly dangerous operations we need to clean up after the fact -- e.g., destroying that barrier -- do seem to work, at least on Linux. The attached patch, therefore, takes this strategy. Crucially, it should have no impact on compliant programs, because it doesn't actually do anything except set/check a single global variable until the user actually enters an OMP section in the child, at which case they have already violated POSIX, so we might as well cross our fingers and hope for the best. (At the very least, the included test does fail on Linux x86-64 without the patch, and passes with the patch.) Other options that might be worth considering: -- Adding some way for libgomp to determine whether threadprivate variables are in use, and then using this information to shut down threads in a pre-fork handler iff doing so is safe. -- Instead of trying to clean up the various mutex/barrier/semaphore detritus left in the child by the evaporating threads, we could simply leak them. I don't know which is worse in practice: a small leak (once per child process), or the risk that the various *_destroy functions will blow up (as POSIX allows them to do). ChangeLog: 2014-02-12 Nathaniel J. Smith n...@pobox.com * team.c (gomp_free_pool_helper): Move per-thread cleanup to main thread. (gomp_free_thread): Delegate implementation to... (gomp_free_thread_pool): ...this new function. Like old gomp_free_thread, but does per-thread cleanup, and has option to skip everything that involves interacting with actual threads, which is useful when called after fork. (gomp_after_fork_callback): New function. (gomp_team_start): Register atfork handler, and check for fork on entry. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org Index: testsuite/libgomp.c/fork-1.c