Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-11-06 Thread Nathaniel Smith
Ping^2.

On Tue, Oct 28, 2014 at 6:17 PM, Nathaniel Smith n...@pobox.com wrote:
 Ping.

 On 19 Oct 2014 23:44, Nathaniel Smith n...@pobox.com wrote:

 Hi Jakub,

 Thanks for your feedback! See below.

 On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
  Got total silence the last 4 times I posted this, and users have been
  bugging me about it offline, so trying again.
 
  This patch fixes a showstopper problem preventing the transparent use
  of OpenMP in scientific libraries, esp. with Python. Specifically, it
  is currently not possible to use GNU OpenMP -- even in a limited,
  temporary manner -- in any program that uses (or might use) fork() for
  parallelism, even if the fork() and the use of OpenMP occur at totally
  different times. This limitation is unique to GNU OpenMP -- every
  competing OpenMP implementation already contains something like this
  patch. While technically not fully POSIX-compliant (because POSIX
  gives much much weaker guarantees around fork() than any real Unix),
  the approach used in this patch (a) performs only POSIX-compliant
  operations when the host program is itself fully POSIX-compliant, and
  (b) actually works perfectly reliably in practice on all commonly used
  platforms I'm aware of.
 
  1) gomp_we_are_forked in your patch will attempt to free the pool
 of the thread that encounters it, which is racy; consider a program
 after fork calling pthread_create several times, each thread
 thusly created then ~ at the same time doing #pragma omp parallel
 and the initial thread too.  You really should clean up the pool
 data structure only in the initial thread and nowhere else;
 for native TLS (non-emulated, IE model) the best would be to have a
  flag
 in the gomp_thread_pool structure,
 struct gomp_thread *thr = gomp_thread ();
 if (thr  thr-thread_pool)
   thr-thread_pool-after_fork = true;
 should in that case be safe in the atfork child handler.
 For !HAVE_TLS or emulated TLS not sure if it is completely safe,
 it would call pthread_getspecific.  Perhaps just don't register
 atfork handler on those targets at all?

 Good point. The updated patch below takes a slightly different
 approach. I moved we_are_forked to the per-thread struct, and then I
 moved the setting of it into the *parent* process's fork handlers --
 the before-fork handler toggles it to true, then the child spawns off
 and inherits this setting, and then the parent after-fork handler
 toggles it back again. (Since it's per-thread, there's no race
 condition here.) This lets us remove the child after-fork handler
 entirely, and -- since the parent handlers aren't subject to any
 restrictions on what they can call -- it works on all platforms
 regardless of the TLS implementation.

  2) can you explain why are you removing the cleanups from
 gomp_free_pool_helper ?

 They aren't removed, but rather moved from the helper function (which
 runs in the helper threads) into gomp_free_thread_pool (which runs in
 the main thread) -- which makes it easier to run the appropriate
 cleanups even in the case where the helper threads aren't running.
 (But see below -- we might prefer to drop this part of the patch
 entirely.)

  3) you can call pthread_atfork many times (once for each pthread
 that creates a thread pool), that is undesirable, you want to do that
 only if the initial thread creates thread pool

 Good point. I've moved the pthread_atfork call to initialize_team,
 which is an __attribute__((constructor)).

 I am a little uncertain whether this is the best approach, though,
 because of the comment in team_destructor about wanting to correctly
 handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
 is that there's no way to unregister handlers, so if dlopen/dlclose is
 important (is it?) then we can't call pthread_atfork from
 initialize_team.

 If this is a problem, then we could delay the pthread_atfork until
 e.g. the first thread pool is spawned -- would this be preferred?

  4) the testcase is clearly not portable enough, should be probably
  limited
 to *-*-linux* only, fork etc. will likely not work on many targets.

 I think it should work on pretty much any target that has fork(); we
 definitely care about having this functionality on e.g. OS X. I've
 added some genericish target specifications.

  In any case, even with the patch, are you aware that you'll leak
  megabytes
  of thread stacks etc.?

 Well, err, I wasn't, no :-). Thanks for pointing that out.

 To me this does clinch the argument that a better approach would be
 the one I suggested in
https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html
 i.e., of tracking whether any threadprivate variables were present,
 and if not then simply shutting down the thread pools before forking.
 But this would be a much more invasive change to gomp (I wouldn't 

Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-10-19 Thread Nathaniel Smith
Hi Jakub,

Thanks for your feedback! See below.

On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
 Got total silence the last 4 times I posted this, and users have been
 bugging me about it offline, so trying again.

 This patch fixes a showstopper problem preventing the transparent use
 of OpenMP in scientific libraries, esp. with Python. Specifically, it
 is currently not possible to use GNU OpenMP -- even in a limited,
 temporary manner -- in any program that uses (or might use) fork() for
 parallelism, even if the fork() and the use of OpenMP occur at totally
 different times. This limitation is unique to GNU OpenMP -- every
 competing OpenMP implementation already contains something like this
 patch. While technically not fully POSIX-compliant (because POSIX
 gives much much weaker guarantees around fork() than any real Unix),
 the approach used in this patch (a) performs only POSIX-compliant
 operations when the host program is itself fully POSIX-compliant, and
 (b) actually works perfectly reliably in practice on all commonly used
 platforms I'm aware of.

 1) gomp_we_are_forked in your patch will attempt to free the pool
of the thread that encounters it, which is racy; consider a program
after fork calling pthread_create several times, each thread
thusly created then ~ at the same time doing #pragma omp parallel
and the initial thread too.  You really should clean up the pool
data structure only in the initial thread and nowhere else;
for native TLS (non-emulated, IE model) the best would be to have a flag
in the gomp_thread_pool structure,
struct gomp_thread *thr = gomp_thread ();
if (thr  thr-thread_pool)
  thr-thread_pool-after_fork = true;
should in that case be safe in the atfork child handler.
For !HAVE_TLS or emulated TLS not sure if it is completely safe,
it would call pthread_getspecific.  Perhaps just don't register
atfork handler on those targets at all?

Good point. The updated patch below takes a slightly different
approach. I moved we_are_forked to the per-thread struct, and then I
moved the setting of it into the *parent* process's fork handlers --
the before-fork handler toggles it to true, then the child spawns off
and inherits this setting, and then the parent after-fork handler
toggles it back again. (Since it's per-thread, there's no race
condition here.) This lets us remove the child after-fork handler
entirely, and -- since the parent handlers aren't subject to any
restrictions on what they can call -- it works on all platforms
regardless of the TLS implementation.

 2) can you explain why are you removing the cleanups from
gomp_free_pool_helper ?

They aren't removed, but rather moved from the helper function (which
runs in the helper threads) into gomp_free_thread_pool (which runs in
the main thread) -- which makes it easier to run the appropriate
cleanups even in the case where the helper threads aren't running.
(But see below -- we might prefer to drop this part of the patch
entirely.)

 3) you can call pthread_atfork many times (once for each pthread
that creates a thread pool), that is undesirable, you want to do that
only if the initial thread creates thread pool

Good point. I've moved the pthread_atfork call to initialize_team,
which is an __attribute__((constructor)).

I am a little uncertain whether this is the best approach, though,
because of the comment in team_destructor about wanting to correctly
handle dlopen/dlclose. One of pthread_atfork's many (many) limitations
is that there's no way to unregister handlers, so if dlopen/dlclose is
important (is it?) then we can't call pthread_atfork from
initialize_team.

If this is a problem, then we could delay the pthread_atfork until
e.g. the first thread pool is spawned -- would this be preferred?

 4) the testcase is clearly not portable enough, should be probably limited
to *-*-linux* only, fork etc. will likely not work on many targets.

I think it should work on pretty much any target that has fork(); we
definitely care about having this functionality on e.g. OS X. I've
added some genericish target specifications.

 In any case, even with the patch, are you aware that you'll leak megabytes
 of thread stacks etc.?

Well, err, I wasn't, no :-). Thanks for pointing that out.

To me this does clinch the argument that a better approach would be
the one I suggested in
   https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html
i.e., of tracking whether any threadprivate variables were present,
and if not then simply shutting down the thread pools before forking.
But this would be a much more invasive change to gomp (I wouldn't know
where to start).

In the mean time, the current patch is still worthwhile. The cost is
not that bad: I wouldn't think of it as leaking so much as overhead
of supporting OMP-fork-OMP. No-one forks a child which forks a
child which 

Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-10-16 Thread Jakub Jelinek
On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote:
 Got total silence the last 4 times I posted this, and users have been
 bugging me about it offline, so trying again.
 
 This patch fixes a showstopper problem preventing the transparent use
 of OpenMP in scientific libraries, esp. with Python. Specifically, it
 is currently not possible to use GNU OpenMP -- even in a limited,
 temporary manner -- in any program that uses (or might use) fork() for
 parallelism, even if the fork() and the use of OpenMP occur at totally
 different times. This limitation is unique to GNU OpenMP -- every
 competing OpenMP implementation already contains something like this
 patch. While technically not fully POSIX-compliant (because POSIX
 gives much much weaker guarantees around fork() than any real Unix),
 the approach used in this patch (a) performs only POSIX-compliant
 operations when the host program is itself fully POSIX-compliant, and
 (b) actually works perfectly reliably in practice on all commonly used
 platforms I'm aware of.

1) gomp_we_are_forked in your patch will attempt to free the pool
   of the thread that encounters it, which is racy; consider a program
   after fork calling pthread_create several times, each thread
   thusly created then ~ at the same time doing #pragma omp parallel
   and the initial thread too.  You really should clean up the pool
   data structure only in the initial thread and nowhere else;
   for native TLS (non-emulated, IE model) the best would be to have a flag
   in the gomp_thread_pool structure,
   struct gomp_thread *thr = gomp_thread ();
   if (thr  thr-thread_pool)
 thr-thread_pool-after_fork = true;
   should in that case be safe in the atfork child handler.
   For !HAVE_TLS or emulated TLS not sure if it is completely safe,
   it would call pthread_getspecific.  Perhaps just don't register
   atfork handler on those targets at all?
2) can you explain why are you removing the cleanups from
   gomp_free_pool_helper ?
3) you can call pthread_atfork many times (once for each pthread
   that creates a thread pool), that is undesirable, you want to do that
   only if the initial thread creates thread pool
4) the testcase is clearly not portable enough, should be probably limited
   to *-*-linux* only, fork etc. will likely not work on many targets.

In any case, even with the patch, are you aware that you'll leak megabytes
of thread stacks etc.?

Jakub


ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-10-13 Thread Nathaniel Smith
Hi all,

Got total silence the last 4 times I posted this, and users have been
bugging me about it offline, so trying again.

This patch fixes a showstopper problem preventing the transparent use
of OpenMP in scientific libraries, esp. with Python. Specifically, it
is currently not possible to use GNU OpenMP -- even in a limited,
temporary manner -- in any program that uses (or might use) fork() for
parallelism, even if the fork() and the use of OpenMP occur at totally
different times. This limitation is unique to GNU OpenMP -- every
competing OpenMP implementation already contains something like this
patch. While technically not fully POSIX-compliant (because POSIX
gives much much weaker guarantees around fork() than any real Unix),
the approach used in this patch (a) performs only POSIX-compliant
operations when the host program is itself fully POSIX-compliant, and
(b) actually works perfectly reliably in practice on all commonly used
platforms I'm aware of.

Tested on linux x86-64. I do not have write access to the SVN repo, so
looking for someone to do the commit.

Previous discussion/review:
  http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
Bugzilla entry:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

2014-02-12  Nathaniel J. Smith  n...@pobox.com

* team.c (gomp_free_pool_helper): Move per-thread cleanup to main
thread.
(gomp_free_thread): Delegate implementation to...
(gomp_free_thread_pool): ...this new function. Like old
gomp_free_thread, but does per-thread cleanup, and has option to
skip everything that involves interacting with actual threads,
which is useful when called after fork.
(gomp_after_fork_callback): New function.
(gomp_team_start): Register atfork handler, and check for fork on
entry.
* testsuite/libgomp.c/fork-1.c: New test.

Thanks,
-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: team.c
===
--- team.c  (revision 207398)
+++ team.c  (working copy)
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include stdlib.h
 #include string.h
+#include stdbool.h
 
 /* This attribute contains PTHREAD_CREATE_DETACHED.  */
 pthread_attr_t gomp_thread_attr;
@@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data;
 pthread_key_t gomp_tls_key;
 #endif
 
+/* This is to enable best-effort cleanup after fork.  */
+static bool gomp_we_are_forked;
 
 /* This structure is used to communicate across pthread_create.  */
 
@@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po
   return pool;
 }
 
+/* Free a thread pool and release its threads. */
+
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
-  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
 = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (pool-threads_dock);
-  gomp_sem_destroy (thr-release);
-  thr-thread_pool = NULL;
-  thr-task = NULL;
   pthread_exit (NULL);
 }
 
-/* Free a thread pool and release its threads. */
-
-void
-gomp_free_thread (void *arg __attribute__((unused)))
+static void
+gomp_free_thread_pool (bool threads_are_running)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
   if (pool)
 {
+  int i;
   if (pool-threads_used  0)
{
- int i;
- for (i = 1; i  pool-threads_used; i++)
+ if (threads_are_running)
{
- struct gomp_thread *nthr = pool-threads[i];
- nthr-fn = gomp_free_pool_helper;
- nthr-data = pool;
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+ nthr-fn = gomp_free_pool_helper;
+ nthr-data = pool;
+   }
+ /* This barrier undocks threads docked on pool-threads_dock.  */
+ gomp_barrier_wait (pool-threads_dock);
+ /* And this waits till all threads have called
+gomp_barrier_wait_last in gomp_free_pool_helper.  */
+ gomp_barrier_wait (pool-threads_dock);
}
- /* This barrier undocks threads docked on pool-threads_dock.  */
- gomp_barrier_wait (pool-threads_dock);
- /* And this waits till all threads have called gomp_barrier_wait_last
-in gomp_free_pool_helper.  */
- gomp_barrier_wait (pool-threads_dock);
  /* Now it is safe to destroy the barrier and free the pool.  */
  gomp_barrier_destroy (pool-threads_dock);
 
@@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool)
  gomp_managed_threads -= pool-threads_used - 1L;
  gomp_mutex_unlock (gomp_managed_threads_lock);
 #endif
+ /* Clean up thread objects */
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+

Ping x 6: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-06-11 Thread Nathaniel Smith
Hi all,

Ping. Again, this patch fixes a limitation in GOMP which makes it
impossible for programmers who care about gcc compatibility to safely
use OpenMP in libraries; GOMP is the last OpenMP implementation with
this limitation.

-n

On Wed, May 14, 2014 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Pinging again about the patch below. The lack of this patch is
 essentially a blocker to using gcc+python+openmp together, which is a
 shame, since python is increasingly important in numerical computing,
 openmp is pretty useful, and gcc is the only openmp implementation
 that does not support this functionality.

 -n

 On Tue, Apr 15, 2014 at 1:19 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote:
 On 02/16/2014 03:59 PM, Nathaniel Smith wrote:
 Yes, but the problem is that depending on what the user intends to do
 after forking, our pthread_atfork handler might help or it might hurt,
 and we don't know which. Consider these two cases:
   - fork+exec
   - fork+continue to use OMP in child
 The former case is totally POSIX-legal, even when performed at
 arbitrary places, even when another thread is, say, in the middle of
 calling malloc().

 Point well taken.

 Hi all,

 I guess this patch has gotten all the feedback that it's getting. Any
 interest in committing it? :-) I don't have commit access.

 2014-02-12  Nathaniel J. Smith  n...@pobox.com

 * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
 thread.
 (gomp_free_thread): Delegate implementation to...
 (gomp_free_thread_pool): ...this new function. Like old
 gomp_free_thread, but does per-thread cleanup, and has option to
 skip everything that involves interacting with actual threads,
 which is useful when called after fork.
 (gomp_after_fork_callback): New function.
 (gomp_team_start): Register atfork handler, and check for fork on
 entry.

 Pinging this again now that trunk has re-opened. For compliant code
 this patch has essentially no impact (OMP-using code acquires a
 single-line post-fork callback which sets a flag; everything else
 works the same as now). For technically non-compliant mostly serial
 code that uses OMP in some places, and forks children in other places,
 it makes a best effort attempt to clean up the thread pool detritus
 left by a fork, instead of simply deadlocking as currently, so as to
 allow children to use OMP as well. This makes GOMP match the behaviour
 of all other OMP implementations I'm aware of.

 Previous discussion:
   http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
 Bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

 I don't have a commit bit -- please commit if acceptable.

 Cheers,
 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: team.c
===
--- team.c  (revision 207398)
+++ team.c  (working copy)
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include stdlib.h
 #include string.h
+#include stdbool.h
 
 /* This attribute contains PTHREAD_CREATE_DETACHED.  */
 pthread_attr_t gomp_thread_attr;
@@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data;
 pthread_key_t gomp_tls_key;
 #endif
 
+/* This is to enable best-effort cleanup after fork.  */
+static bool gomp_we_are_forked;
 
 /* This structure is used to communicate across pthread_create.  */
 
@@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po
   return pool;
 }
 
+/* Free a thread pool and release its threads. */
+
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
-  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
 = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (pool-threads_dock);
-  gomp_sem_destroy (thr-release);
-  thr-thread_pool = NULL;
-  thr-task = NULL;
   pthread_exit (NULL);
 }
 
-/* Free a thread pool and release its threads. */
-
-void
-gomp_free_thread (void *arg __attribute__((unused)))
+static void
+gomp_free_thread_pool (bool threads_are_running)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
   if (pool)
 {
+  int i;
   if (pool-threads_used  0)
{
- int i;
- for (i = 1; i  pool-threads_used; i++)
+ if (threads_are_running)
{
- struct gomp_thread *nthr = pool-threads[i];
- nthr-fn = gomp_free_pool_helper;
- nthr-data = pool;
+ for (i = 1; i  pool-threads_used; i++)
+   {
+  

Ping x 5: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-05-14 Thread Nathaniel Smith
Hi all,

Pinging again about the patch below. The lack of this patch is
essentially a blocker to using gcc+python+openmp together, which is a
shame, since python is increasingly important in numerical computing,
openmp is pretty useful, and gcc is the only openmp implementation
that does not support this functionality.

-n

On Tue, Apr 15, 2014 at 1:19 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote:
 On 02/16/2014 03:59 PM, Nathaniel Smith wrote:
 Yes, but the problem is that depending on what the user intends to do
 after forking, our pthread_atfork handler might help or it might hurt,
 and we don't know which. Consider these two cases:
   - fork+exec
   - fork+continue to use OMP in child
 The former case is totally POSIX-legal, even when performed at
 arbitrary places, even when another thread is, say, in the middle of
 calling malloc().

 Point well taken.

 Hi all,

 I guess this patch has gotten all the feedback that it's getting. Any
 interest in committing it? :-) I don't have commit access.

 2014-02-12  Nathaniel J. Smith  n...@pobox.com

 * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
 thread.
 (gomp_free_thread): Delegate implementation to...
 (gomp_free_thread_pool): ...this new function. Like old
 gomp_free_thread, but does per-thread cleanup, and has option to
 skip everything that involves interacting with actual threads,
 which is useful when called after fork.
 (gomp_after_fork_callback): New function.
 (gomp_team_start): Register atfork handler, and check for fork on
 entry.

 Pinging this again now that trunk has re-opened. For compliant code
 this patch has essentially no impact (OMP-using code acquires a
 single-line post-fork callback which sets a flag; everything else
 works the same as now). For technically non-compliant mostly serial
 code that uses OMP in some places, and forks children in other places,
 it makes a best effort attempt to clean up the thread pool detritus
 left by a fork, instead of simply deadlocking as currently, so as to
 allow children to use OMP as well. This makes GOMP match the behaviour
 of all other OMP implementations I'm aware of.

 Previous discussion:
   http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
 Bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

 I don't have a commit bit -- please commit if acceptable.

 Cheers,
 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: team.c
===
--- team.c  (revision 207398)
+++ team.c  (working copy)
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include stdlib.h
 #include string.h
+#include stdbool.h
 
 /* This attribute contains PTHREAD_CREATE_DETACHED.  */
 pthread_attr_t gomp_thread_attr;
@@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data;
 pthread_key_t gomp_tls_key;
 #endif
 
+/* This is to enable best-effort cleanup after fork.  */
+static bool gomp_we_are_forked;
 
 /* This structure is used to communicate across pthread_create.  */
 
@@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po
   return pool;
 }
 
+/* Free a thread pool and release its threads. */
+
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
-  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
 = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (pool-threads_dock);
-  gomp_sem_destroy (thr-release);
-  thr-thread_pool = NULL;
-  thr-task = NULL;
   pthread_exit (NULL);
 }
 
-/* Free a thread pool and release its threads. */
-
-void
-gomp_free_thread (void *arg __attribute__((unused)))
+static void
+gomp_free_thread_pool (bool threads_are_running)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
   if (pool)
 {
+  int i;
   if (pool-threads_used  0)
{
- int i;
- for (i = 1; i  pool-threads_used; i++)
+ if (threads_are_running)
{
- struct gomp_thread *nthr = pool-threads[i];
- nthr-fn = gomp_free_pool_helper;
- nthr-data = pool;
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+ nthr-fn = gomp_free_pool_helper;
+ nthr-data = pool;
+   }
+ /* This barrier undocks threads docked on pool-threads_dock.  */
+ gomp_barrier_wait (pool-threads_dock);
+ /* And this waits till all threads have called
+gomp_barrier_wait_last in gomp_free_pool_helper.  */
+  

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-04-15 Thread Nathaniel Smith
On Tue, Mar 4, 2014 at 11:37 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote:
 On 02/16/2014 03:59 PM, Nathaniel Smith wrote:
 Yes, but the problem is that depending on what the user intends to do
 after forking, our pthread_atfork handler might help or it might hurt,
 and we don't know which. Consider these two cases:
   - fork+exec
   - fork+continue to use OMP in child
 The former case is totally POSIX-legal, even when performed at
 arbitrary places, even when another thread is, say, in the middle of
 calling malloc().

 Point well taken.

 Hi all,

 I guess this patch has gotten all the feedback that it's getting. Any
 interest in committing it? :-) I don't have commit access.

 2014-02-12  Nathaniel J. Smith  n...@pobox.com

 * team.c (gomp_free_pool_helper): Move per-thread cleanup to main
 thread.
 (gomp_free_thread): Delegate implementation to...
 (gomp_free_thread_pool): ...this new function. Like old
 gomp_free_thread, but does per-thread cleanup, and has option to
 skip everything that involves interacting with actual threads,
 which is useful when called after fork.
 (gomp_after_fork_callback): New function.
 (gomp_team_start): Register atfork handler, and check for fork on
 entry.

Pinging this again now that trunk has re-opened. For compliant code
this patch has essentially no impact (OMP-using code acquires a
single-line post-fork callback which sets a flag; everything else
works the same as now). For technically non-compliant mostly serial
code that uses OMP in some places, and forks children in other places,
it makes a best effort attempt to clean up the thread pool detritus
left by a fork, instead of simply deadlocking as currently, so as to
allow children to use OMP as well. This makes GOMP match the behaviour
of all other OMP implementations I'm aware of.

Previous discussion:
  http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
Bug:
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

I don't have a commit bit -- please commit if acceptable.

Cheers,
-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-03-04 Thread Nathaniel Smith
On Tue, Feb 18, 2014 at 8:58 PM, Richard Henderson r...@redhat.com wrote:
 On 02/16/2014 03:59 PM, Nathaniel Smith wrote:
 Yes, but the problem is that depending on what the user intends to do
 after forking, our pthread_atfork handler might help or it might hurt,
 and we don't know which. Consider these two cases:
   - fork+exec
   - fork+continue to use OMP in child
 The former case is totally POSIX-legal, even when performed at
 arbitrary places, even when another thread is, say, in the middle of
 calling malloc().

 Point well taken.

Hi all,

I guess this patch has gotten all the feedback that it's getting. Any
interest in committing it? :-) I don't have commit access.

2014-02-12  Nathaniel J. Smith  n...@pobox.com

* team.c (gomp_free_pool_helper): Move per-thread cleanup to main
thread.
(gomp_free_thread): Delegate implementation to...
(gomp_free_thread_pool): ...this new function. Like old
gomp_free_thread, but does per-thread cleanup, and has option to
skip everything that involves interacting with actual threads,
which is useful when called after fork.
(gomp_after_fork_callback): New function.
(gomp_team_start): Register atfork handler, and check for fork on
entry.

Cheers,
-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: team.c
===
--- team.c  (revision 207398)
+++ team.c  (working copy)
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include stdlib.h
 #include string.h
+#include stdbool.h
 
 /* This attribute contains PTHREAD_CREATE_DETACHED.  */
 pthread_attr_t gomp_thread_attr;
@@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data;
 pthread_key_t gomp_tls_key;
 #endif
 
+/* This is to enable best-effort cleanup after fork.  */
+static bool gomp_we_are_forked;
 
 /* This structure is used to communicate across pthread_create.  */
 
@@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po
   return pool;
 }
 
+/* Free a thread pool and release its threads. */
+
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
-  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
 = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (pool-threads_dock);
-  gomp_sem_destroy (thr-release);
-  thr-thread_pool = NULL;
-  thr-task = NULL;
   pthread_exit (NULL);
 }
 
-/* Free a thread pool and release its threads. */
-
-void
-gomp_free_thread (void *arg __attribute__((unused)))
+static void
+gomp_free_thread_pool (bool threads_are_running)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
   if (pool)
 {
+  int i;
   if (pool-threads_used  0)
{
- int i;
- for (i = 1; i  pool-threads_used; i++)
+ if (threads_are_running)
{
- struct gomp_thread *nthr = pool-threads[i];
- nthr-fn = gomp_free_pool_helper;
- nthr-data = pool;
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+ nthr-fn = gomp_free_pool_helper;
+ nthr-data = pool;
+   }
+ /* This barrier undocks threads docked on pool-threads_dock.  */
+ gomp_barrier_wait (pool-threads_dock);
+ /* And this waits till all threads have called
+gomp_barrier_wait_last in gomp_free_pool_helper.  */
+ gomp_barrier_wait (pool-threads_dock);
}
- /* This barrier undocks threads docked on pool-threads_dock.  */
- gomp_barrier_wait (pool-threads_dock);
- /* And this waits till all threads have called gomp_barrier_wait_last
-in gomp_free_pool_helper.  */
- gomp_barrier_wait (pool-threads_dock);
  /* Now it is safe to destroy the barrier and free the pool.  */
  gomp_barrier_destroy (pool-threads_dock);
 
@@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool)
  gomp_managed_threads -= pool-threads_used - 1L;
  gomp_mutex_unlock (gomp_managed_threads_lock);
 #endif
+ /* Clean up thread objects */
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+ gomp_sem_destroy (nthr-release);
+ nthr-thread_pool = NULL;
+ nthr-task = NULL;
+   }
}
   free (pool-threads);
   if (pool-last_team)
@@ -266,6 +276,58 @@ gomp_free_pool_helper (void *thread_pool)
 }
 }
 
+/* This is called whenever a thread exits which has a non-NULL value for
+   gomp_thread_destructor. In practice, the only thread for which this occurs
+   is the one which created the thread pool.
+*/
+void
+gomp_free_thread (void *arg __attribute__((unused)))
+{
+  gomp_free_thread_pool (true);

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-18 Thread Richard Henderson
On 02/16/2014 03:59 PM, Nathaniel Smith wrote:
 Yes, but the problem is that depending on what the user intends to do
 after forking, our pthread_atfork handler might help or it might hurt,
 and we don't know which. Consider these two cases:
   - fork+exec
   - fork+continue to use OMP in child
 The former case is totally POSIX-legal, even when performed at
 arbitrary places, even when another thread is, say, in the middle of
 calling malloc(). 

Point well taken.


r~


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-16 Thread Nathaniel Smith
On Fri, Feb 14, 2014 at 3:14 PM, Richard Henderson r...@redhat.com wrote:
 On 02/14/2014 12:21 AM, Jakub Jelinek wrote:
 Any reason not to just run gomp_free_thread_pool from 
 gomp_after_fork_callback
 directly?  I see no restrictions on what kind of code is allowed to execute
 during that callback.

 Well, fork is async signal safe function, so calling malloc/free, or any
 kind of synchronization primitives is completely unsafe there.

 That's as may be, but even the opengroup's rationale for pthread_atfork
 mentions using locks in the three callbacks.  I strongly suspect that no real
 use of pthread_atfork can ever really be async safe.

Yes, but the problem is that depending on what the user intends to do
after forking, our pthread_atfork handler might help or it might hurt,
and we don't know which. Consider these two cases:
  - fork+exec
  - fork+continue to use OMP in child
The former case is totally POSIX-legal, even when performed at
arbitrary places, even when another thread is, say, in the middle of
calling malloc(). If we register a pthread_atfork handler which calls
non-signal-safe functions, then we risk breaking POSIX-legal programs
like this. The latter case is broken in current GOMP, but we would
like it to work as well -- at least when possible. So the way the
patch is structured the way it is, is to ensure that we have minimal
impact on the former case while still giving the latter case a chance
to succeed.

Updated patch addressing your other comments attached.

2014-02-12  Nathaniel J. Smith  n...@pobox.com

* team.c (gomp_free_pool_helper): Move per-thread cleanup to main
thread.
(gomp_free_thread): Delegate implementation to...
(gomp_free_thread_pool): ...this new function. Like old
gomp_free_thread, but does per-thread cleanup, and has option to
skip everything that involves interacting with actual threads,
which is useful when called after fork.
(gomp_after_fork_callback): New function.
(gomp_team_start): Register atfork handler, and check for fork on
entry.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: team.c
===
--- team.c  (revision 207398)
+++ team.c  (working copy)
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include stdlib.h
 #include string.h
+#include stdbool.h
 
 /* This attribute contains PTHREAD_CREATE_DETACHED.  */
 pthread_attr_t gomp_thread_attr;
@@ -43,6 +44,8 @@ __thread struct gomp_thread gomp_tls_data;
 pthread_key_t gomp_tls_key;
 #endif
 
+/* This is to enable best-effort cleanup after fork.  */
+static bool gomp_we_are_forked;
 
 /* This structure is used to communicate across pthread_create.  */
 
@@ -204,42 +207,41 @@ static struct gomp_thread_pool *gomp_new_thread_po
   return pool;
 }
 
+/* Free a thread pool and release its threads. */
+
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
-  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
 = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (pool-threads_dock);
-  gomp_sem_destroy (thr-release);
-  thr-thread_pool = NULL;
-  thr-task = NULL;
   pthread_exit (NULL);
 }
 
-/* Free a thread pool and release its threads. */
-
-void
-gomp_free_thread (void *arg __attribute__((unused)))
+static void
+gomp_free_thread_pool (bool threads_are_running)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
   if (pool)
 {
+  int i;
   if (pool-threads_used  0)
{
- int i;
- for (i = 1; i  pool-threads_used; i++)
+ if (threads_are_running)
{
- struct gomp_thread *nthr = pool-threads[i];
- nthr-fn = gomp_free_pool_helper;
- nthr-data = pool;
+ for (i = 1; i  pool-threads_used; i++)
+   {
+ struct gomp_thread *nthr = pool-threads[i];
+ nthr-fn = gomp_free_pool_helper;
+ nthr-data = pool;
+   }
+ /* This barrier undocks threads docked on pool-threads_dock.  */
+ gomp_barrier_wait (pool-threads_dock);
+ /* And this waits till all threads have called
+gomp_barrier_wait_last in gomp_free_pool_helper.  */
+ gomp_barrier_wait (pool-threads_dock);
}
- /* This barrier undocks threads docked on pool-threads_dock.  */
- gomp_barrier_wait (pool-threads_dock);
- /* And this waits till all threads have called gomp_barrier_wait_last
-in gomp_free_pool_helper.  */
- gomp_barrier_wait (pool-threads_dock);
  /* Now it is safe to destroy the barrier and free the pool.  */
  gomp_barrier_destroy (pool-threads_dock);
 
@@ -251,6 +253,14 @@ gomp_free_pool_helper (void *thread_pool)
  gomp_managed_threads -= pool-threads_used - 1L;
  gomp_mutex_unlock 

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-16 Thread Nathaniel Smith
On Fri, Feb 14, 2014 at 3:43 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Feb 14, 2014 at 09:21:24AM +0100, Jakub Jelinek wrote:
 Well, fork is async signal safe function, so calling malloc/free, or any
 kind of synchronization primitives is completely unsafe there.

 The only safe thing could be to atomically or in some global flag (or set
 some TLS flag?) and deal with the freeing next time you encounter omp
 parallel.  But, the state of the old thread pool may be in some inconsistent
 shape.

 BTW, I think far cleaner solution would be to discuss on Omp-lang and add
 some standard omp_* function which would allow to throw away all the cached
 OpenMP threads, after calling that function one could not assume
 threadprivate vars (other than in the initial thread) preserve their values.
 If this function would be only allowed outside of the parallel region (i.e.
 if omp_in_parallel () == 0, or even just if omp_get_level () == 0) and
 pretend to do
 #pragma omp parallel num_threads (1)
 ;
 i.e. something after which it isn't guaranteed to preserve threadprivate
 vars, then the library could perform this at the point where it is safe to
 do so (of course it wouldn't be async-signal-safe function) and isn't a
 performance issue (calling it when you are expecting to soon launch another
 #pragma omp parallel could of course slow things down a lot).

 Anything else is going to be either unsafe, or leak memory.

I think the core problem here is that it's not possible to hide OMP
usage inside an interface boundary, so you can't reliably compose
larger programs out of pieces that individually may or may not use
OMP. And unfortunately, I don't think your proposed
omp_forget_threadprivates() function helps solve that problem.

Like, consider the case of DGEMM implemented using OMP. Right now,
this means that calling DGEMM will break fork(). If DGEMM instead
called omp_forget_threadprivates(), then fork() would work. BUT, the
program might contain other code -- perhaps in a different library --
which also used threadprivates. And every time someone calls DGEMM,
this other code would find that their threadprivates had mysteriously
disappeared. (Or might find this, depending on whether fork() was
called in between, etc.) Whether it would be safe to call
omp_forget_threadprivates() is a global property of the whole program,
not something that can be determined by looking at any one piece in
isolation.

What *would* work is if GOMP started tracking whether there were any
threadprivate variables in existence. Then it would be possible to
dynamically determine at runtime whether this global property held,
and automatically clean up threads at pre-fork-time iff it was
globally safe to do so. Then we could document that *if* you want to
use OMP inside a composable library, then you have to avoid
threadprivates, and everything would just work. (In this particular
case, it is true that OpenBLAS already doesn't use any
threadprivates...)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Jakub Jelinek
On Thu, Feb 13, 2014 at 01:22:41PM -0800, Richard Henderson wrote:
  +/* This is to enable best-effort cleanup after fork.  */
  +static int gomp_we_are_forked = 0;
 
 bool, no explicit initialization, possible removal, see below.
 
  +static void
  +gomp_free_thread_pool (int threads_running)
 
 bool for threads_running.  It looks like a count otherwise.
 
  +gomp_after_fork_callback ()
 
  (void)
 
  +  pthread_atfork (NULL, NULL, gomp_after_fork_callback);
 
  not needed.
 
 Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback
 directly?  I see no restrictions on what kind of code is allowed to execute
 during that callback.

Well, fork is async signal safe function, so calling malloc/free, or any
kind of synchronization primitives is completely unsafe there.

The only safe thing could be to atomically or in some global flag (or set
some TLS flag?) and deal with the freeing next time you encounter omp
parallel.  But, the state of the old thread pool may be in some inconsistent
shape.

Jakub


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Jakub Jelinek
On Fri, Feb 14, 2014 at 09:21:24AM +0100, Jakub Jelinek wrote:
 Well, fork is async signal safe function, so calling malloc/free, or any
 kind of synchronization primitives is completely unsafe there.
 
 The only safe thing could be to atomically or in some global flag (or set
 some TLS flag?) and deal with the freeing next time you encounter omp
 parallel.  But, the state of the old thread pool may be in some inconsistent
 shape.

BTW, I think far cleaner solution would be to discuss on Omp-lang and add
some standard omp_* function which would allow to throw away all the cached
OpenMP threads, after calling that function one could not assume
threadprivate vars (other than in the initial thread) preserve their values.
If this function would be only allowed outside of the parallel region (i.e.
if omp_in_parallel () == 0, or even just if omp_get_level () == 0) and
pretend to do
#pragma omp parallel num_threads (1)
;
i.e. something after which it isn't guaranteed to preserve threadprivate
vars, then the library could perform this at the point where it is safe to
do so (of course it wouldn't be async-signal-safe function) and isn't a
performance issue (calling it when you are expecting to soon launch another
#pragma omp parallel could of course slow things down a lot).

Anything else is going to be either unsafe, or leak memory.

Jakub


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Richard Henderson
On 02/14/2014 12:21 AM, Jakub Jelinek wrote:
 Any reason not to just run gomp_free_thread_pool from 
 gomp_after_fork_callback
 directly?  I see no restrictions on what kind of code is allowed to execute
 during that callback.
 
 Well, fork is async signal safe function, so calling malloc/free, or any
 kind of synchronization primitives is completely unsafe there.

That's as may be, but even the opengroup's rationale for pthread_atfork
mentions using locks in the three callbacks.  I strongly suspect that no real
use of pthread_atfork can ever really be async safe.


r~


Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-13 Thread Richard Henderson
 +/* This is to enable best-effort cleanup after fork.  */
 +static int gomp_we_are_forked = 0;

bool, no explicit initialization, possible removal, see below.

 +static void
 +gomp_free_thread_pool (int threads_running)

bool for threads_running.  It looks like a count otherwise.

 +gomp_after_fork_callback ()

 (void)

 +  pthread_atfork (NULL, NULL, gomp_after_fork_callback);

 not needed.

Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback
directly?  I see no restrictions on what kind of code is allowed to execute
during that callback.


r~


[PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-12 Thread Nathaniel Smith
Problem: A common use care for OMP is to accelerate the internal
workings of an otherwise serial interface. For example, OpenBLAS in
some settings will internally use OMP to accelerate the implementation
of matrix-matrix multiply (DGEMM). When DGEMM is called, then an OMP
section is started, the work is done, then the OMP section exits, the
program returns to serial mode, and DGEMM returns. All this is
entirely transparent to the user -- in fact, it's common for users to
switch between different linear algebra cores (BLAS libraries) without
recompiling, so it's impossible for code that uses linear algebra to
know which underlying library is in use, or how it has been compiled.

However, in order to support some corners of the OMP spec, it is
important that the threads that were started to implement an OMP
parallel section be kept around, in case another OMP section has
started. (AFAICT this is only true when threadprivate variables are
in use. Unfortunately AFAICT there is currently no way to determine
whether this is the case -- such variables are handled directly by GCC
without calling into libgomp, so we can't tell at runtime whether they
exist.)

And, this causes a big problem and abstraction leak: it means that if
you use OMP (e.g., by multiplying two matrices), and then fork, and
then the child also uses OMP (e.g., by also multiplying two matrices),
then the child immediately deadlocks (as OMP waits for threads that it
thinks still exist, but that disappeared during the fork). The result
is that it simply *is not possible to know* whether fork() will
actually work as advertised, even when writing purely serial code, if
that code happens to do seemingly innocent things like linear algebra.
And this then ends up causing surprising wreakage in far-flung parts
of the numerical ecosystem (e.g., here's someone trying to figure
figure out why their web site's task manager crashes whenever they try
to plot a graph: https://github.com/celery/celery/issues/1842).

(Somewhat more impassioned rant and references to previous discussions
here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035)

In practice, GOMP seems to be the only OMP implementation that suffers
from this problem; people who encounter this problem are often advised
to switch to icc.

There does not appear to be any fully POSIX-compliant way to solve
this problem (not least because in a strict reading of the POSIX spec,
you aren't guaranteed to be able to do practically *anything* after a
fork() in any program which has ever called a pthreads_* function). In
a less strict reading, we might expect to be okay if no threads are
actually running at the time that fork() is called -- but, we can't
shut down OMP threads before forking, because of the issue with
threadprivate variables -- it might change the behaviour of compliant
programs.

But in practice, if the fork() occurs at a time when every thread is
just sitting waiting on a barrier, then we can be pretty sure that
libc etc. will be in a generally thread-consistent state. And in
practice, the few truly dangerous operations we need to clean up after
the fact -- e.g., destroying that barrier -- do seem to work, at least
on Linux. The attached patch, therefore, takes this strategy.
Crucially, it should have no impact on compliant programs, because it
doesn't actually do anything except set/check a single global variable
until the user actually enters an OMP section in the child, at which
case they have already violated POSIX, so we might as well cross our
fingers and hope for the best. (At the very least, the included test
does fail on Linux x86-64 without the patch, and passes with the
patch.)

Other options that might be worth considering:
-- Adding some way for libgomp to determine whether threadprivate
variables are in use, and then using this information to shut down
threads in a pre-fork handler iff doing so is safe.
-- Instead of trying to clean up the various mutex/barrier/semaphore
detritus left in the child by the evaporating threads, we could simply
leak them. I don't know which is worse in practice: a small leak (once
per child process), or the risk that the various *_destroy functions
will blow up (as POSIX allows them to do).

ChangeLog:

2014-02-12  Nathaniel J. Smith  n...@pobox.com

* team.c (gomp_free_pool_helper): Move per-thread cleanup to main
thread.
(gomp_free_thread): Delegate implementation to...
(gomp_free_thread_pool): ...this new function. Like old
gomp_free_thread, but does per-thread cleanup, and has option to
skip everything that involves interacting with actual threads,
which is useful when called after fork.
(gomp_after_fork_callback): New function.
(gomp_team_start): Register atfork handler, and check for fork on
entry.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Index: testsuite/libgomp.c/fork-1.c