[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

badlink andrea.9...@gmail.com changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #35 from badlink andrea.9...@gmail.com ---
I took my time and started digging trough the druntime source to find the
problem. I discovered that core.thread.suspend() sends SIGUSR1 to the thread to
suspend so I launched GDB and tried to catch the signal but GDB never caught
it. I couldn't explain that behavior so I googled linux sigusr1 not sent and
Bam! -- https://bbs.archlinux.org/viewtopic.php?id=181142
The people on there already figured that it's a bug in the Gnome Display
Manager which blocks SIGUSR1 for all child applications !
The bug is present in the current package gdm-3.12.2-1 which I am using and is
causing this nasty deadlock in the D garbage collector.
As a quick test I tried to run the testcase in a tty and it worked as expected.
Hopefully the problem will solve itself in the next gdm update.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #34 from badlink andrea.9...@gmail.com ---
(In reply to Sean Kelly from comment #33)
 The GC was in use for probably 5 years without a reported deadlock.  Though
 history isn't exactly proof.  I don't suppose someone wants to regress this
 and find the offending release?  Isn't there a D tool for this?  Or would we
 be stuck with git bisect?

I just have tried a few old versions:
- 2.054 (self-compiled) deadlocks
- 2.042 (self-compiled) deadlocks
- 1.076 (http://dlang.org/download.html) fullCollect never returns
- 1.030 (http://dlang.org/download.html) fullCollect never returns

code used for D1: http://pastebin.com/wu9guHA6
stacktrace (DMDv1.030): http://pastebin.com/CfNStRvm

Does anyone else have this issue ?
Right now I'm on Arch Linux 3.14.19-1-lts x86_64, GNU libc 2.20.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #32 from Martin Nowak c...@dawg.eu ---
Can we first confirm that this is a regression.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #33 from Sean Kelly s...@invisibleduck.org ---
The GC was in use for probably 5 years without a reported deadlock.  Though
history isn't exactly proof.  I don't suppose someone wants to regress this and
find the offending release?  Isn't there a D tool for this?  Or would we be
stuck with git bisect?

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-02 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #31 from Sobirari Muhomori dfj1es...@sneakemail.com ---
Hmm... stack trace in issue 11806 is quite different.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-01 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #29 from badlink andrea.9...@gmail.com ---
Also present in DMD 2.067.0-b1.
Stacktrace of the sample program in comment 10: http://pastebin.com/4mudSeEX

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-10-01 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

badlink andrea.9...@gmail.com changed:

   What|Removed |Added

 CC||christ...@nerdtools.de

--- Comment #30 from badlink andrea.9...@gmail.com ---
*** Issue 11806 has been marked as a duplicate of this issue. ***

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-10 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

Brad Roberts bra...@puremagic.com changed:

   What|Removed |Added

 CC||bra...@puremagic.com

--- Comment #28 from Brad Roberts bra...@puremagic.com ---
Might not be related, but for reference, bug 13416

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-08 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #27 from Sean Kelly s...@invisibleduck.org ---
Earlier than that.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #23 from Tomash Brechko tomash.brec...@gmail.com ---
I think the order of events is such that pthread_create() is followed by
pthread_kill() from main thread before the new thread had any chance to run. 
In this case there are reports that the new thread may miss signals on Linux:
http://stackoverflow.com/questions/14827509/does-the-new-thread-exist-when-pthread-create-returns
.  I think POSIX intent is such that pthread_kill() should work once you have
thread ID, i.e. it's a bug with (some versions of) Linux kernel (maybe the
signal is first raised and then pending signals are cleared (per POSIX) for the
new thread when it starts, or the signal is not become pending as it is not
blocked, but is not delivered either because the thread is not really running
yet; though on my 3.15.10 pthread_kill() after pthread_create() always works in
C, and I don't have D compiler at the moment to check if I'm still able to
reproduce original problem).  OTOH issue 10351 is marked as duplicate, but it's
not clear if the threads involved there are newly created.

On a side note, in thread_entryPoint() there's a place:

 // NOTE: isRunning should be set to false after the thread is
 // removed or a double-removal could occur between this
 // function and thread_suspendAll.
 Thread.remove( obj );
 obj.m_isRunning = false;

Note that if thread_suspendAll() is called after remove() but before assignment
you still will have double removal.  This shouldn't relate to bug in question
however.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #24 from Tomash Brechko tomash.brec...@gmail.com ---
Now I see that I was wrong about double removal, please ignore that part.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #25 from Sean Kelly s...@invisibleduck.org ---
Hrm... at one point thread_entryPoint called Thread.add to add itself, but I
think the add was moved to Thread.start at some point to deal with a race.  I
had a comment in Thread.start explaining the rationale, but it looks like
Thread.start has been heavily edited and the comment is gone.  Either way,
having Thread.start call Thread.add *after* pthread_create is totally wrong, as
it leaves a window for the thread to exist and be allocating memory but be
unknown to the GC.

I think I'll have to roll back thread.d to find my original comments and see
how it used to be implemented.  Something was clearly changed here, but there's
no longer enough info to tell exactly what.

I've got to say that seeing these and other changes in core.thread without
careful documentation of what was changed and why it was done is very
frustrating.  There's simply no way to unit test for the existence or lack of
deadlocks, and the comments in this module were built up over years of bug
fixes to explain each situation and why the code was the way it was.  If
someone changes the code in this module they *must* be absolutely sure of what
they are doing and document accordingly.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #26 from safety0ff.bugz safety0ff.b...@gmail.com ---
(In reply to Sean Kelly from comment #25)

 I think I'll have to roll back thread.d to find my original comments and see
 how it used to be implemented.  Something was clearly changed here, but
 there's no longer enough info to tell exactly what.

This change?
https://github.com/D-Programming-Language/druntime/commit/7a731ffe0869dc

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #16 from badlink andrea.9...@gmail.com ---
(In reply to Sean Kelly from comment #15)
 Okay, I can't reproduce this using the provided code on Oracle Linux 64-bit.
 If someone has a reliable repro, please let me know.

My Linux machine is using Arch Linux, 3.14.17-1-lts x86_64 kernel, GNU libc
2.19.
Oracle Linux is completely different as it is using the 3.8.13 x86_64 kernel
and glibc 2.17
(http://www.oracle.com/us/technologies/linux/product/specifications/index.html).
Try Manjaro Linux wich is based on Arch but come with a ready desktop
environment (just run `pacman -S dlang-dmd` to get DMD)

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #17 from badlink andrea.9...@gmail.com ---
Created attachment 1416
  -- https://issues.dlang.org/attachment.cgi?id=1416action=edit
stack trace

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

Marco Leise marco.le...@gmx.de changed:

   What|Removed |Added

 CC||marco.le...@gmx.de

--- Comment #18 from Marco Leise marco.le...@gmx.de ---
*** Issue 10351 has been marked as a duplicate of this issue. ***

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #19 from Sobirari Muhomori dfj1es...@sneakemail.com ---
(In reply to badlink from comment #17)
 stack trace

Hmm... if a thread hangs on a mutex, does it handle signals?

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #20 from Sean Kelly s...@invisibleduck.org ---
It should.  Not doing so seems pretty broken.  But it this particular kernel it
seems like maybe signals are ignored in this situation.

What's happening specifically is that the one thread is blocked on the mutex
protecting the GC, and another thread holds that lock and is attempting a
collection.

I could change this code to use a spin lock instead, but the same problem could
crop up with any mutex if I understand the problem correctly.  I'm kind of
curious to see whether the Boehm GC deadlocks in a similar situation with this
kernel.  It should, since last time I checked it coordinated collections the
exact same way on Linux.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #21 from Sobirari Muhomori dfj1es...@sneakemail.com ---
This mutex protects various global data like the list of threads in
core.thread, not GC.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-05 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #22 from Sean Kelly s...@invisibleduck.org ---
Yes I misspoke somewhat.  The GC acquires the lock to the global thread list
while collecting to ensure that everything remains in a consistent state while
the collection takes place.  In this case the GC already holds this lock and
Thread.start() is blocked on it waiting to add the new thread to the list.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

andrea.9...@gmail.com changed:

   What|Removed |Added

   Keywords||industry
 Status|RESOLVED|REOPENED
 CC||andrea.9...@gmail.com
 Resolution|FIXED   |---
   Severity|normal  |regression

--- Comment #10 from andrea.9...@gmail.com ---
This bug is present in DMD 2.066 on Arch Linux 3.14.17-1-lts x86_64 (GNU libc
2.19).
The code posted originally still deadlocks (and even with j.sleep uncommented,
it never prints a . which means GC.collect never returns):

import core.thread, core.memory, std.stdio;

class Job : Thread {
 this() {
   super(run);
 }

 private void run() {
   while (true) write(*);
 }
}

void main() {
 Job j = new Job;
 j.start();

 //j.sleep(dur!msecs(1));

 GC.collect();

 while(true) write(.);
}

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #11 from Sean Kelly s...@invisibleduck.org ---
My initial guess is that this has something to do with the changes for critical
regions, as the algorithm for collection before that seemed quite solid.  I'll
try for a repro on my end though.  What would be really useful from whoever
encounters this is to trap it in a debugger and include stack traces of all
relevant threads.  Something has to be blocked on a lock or signal somewhere,
but without knowing which one there's little that can be done.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #12 from Sean Kelly s...@invisibleduck.org ---
Um... I may be wrong in what I just said.  It looks like someone added a
delegate call within the signal handler for coordinating collections on Linux. 
There's a decent chance that a dynamic stack frame is being allocated by the GC
within that signal handler, which would be Very Bad.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #13 from andrea.9...@gmail.com ---
Just tested, the bug is not present on Windows (DMD 2.066)

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #14 from Sean Kelly s...@invisibleduck.org ---
It's likely as I said.  The way GC collections work is different on different
platforms.  Both Windows and OSX use a kernel call to suspend threads and
inspect their stacks.  On other Unix platforms (like Linux), the suspending is
done via signals, and signal handlers are VERY restrictive in what can safely
be done inside them.  And either way, having one thread try to allocate
something from the GC inside this suspend handler is a guaranteed deadlock.  If
this is really what's going on I'm amazed that D on Linux works at all.  Maybe
it really is something else...

I'm setting up a new Linux VM and so should hopefully be able to repro this
shortly.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-09-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #15 from Sean Kelly s...@invisibleduck.org ---
Okay, I can't reproduce this using the provided code on Oracle Linux 64-bit. 
If someone has a reliable repro, please let me know.

--


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-02-08 Thread d-bugmail
https://d.puremagic.com/issues/show_bug.cgi?id=4890


Stanislav Blinov stanislav.bli...@gmail.com changed:

   What|Removed |Added

 CC||stanislav.bli...@gmail.com
   Platform|x86 |x86_64


--- Comment #7 from Stanislav Blinov stanislav.bli...@gmail.com 2014-02-08 
14:05:36 PST ---
A quick search lead me to this issue. It would appear the deadlock still
occurs.
I've been encountering it now and then, first when running singleton tests from
http://forum.dlang.org/thread/mailman.158.1391156715.13884.digitalmar...@puremagic.com,
then when running druntime unittests (more specifically, test/shared/host)
while working on providing shared qualifiers for core.sync primitives.

At first I though it had to do with my changes to druntime, but after testing
on a clean druntime I encoutered it as well. 

The deadlock doesn't happen on every run though, so may be tricky to track
down.
It's in this piece of test/shared/src/plugin.d:

23  launchThread();
24  GC.collect();
25  joinThread();

GC.collect() simply doesn't return. I haven't investigated deeper yet. Maybe it
has something to do with GC trying to pause/resume an exiting/finished thread?
This is on 64-bit Linux.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-02-08 Thread d-bugmail
https://d.puremagic.com/issues/show_bug.cgi?id=4890


safety0ff.bugz safety0ff.b...@gmail.com changed:

   What|Removed |Added

 CC||safety0ff.b...@gmail.com


--- Comment #8 from safety0ff.bugz safety0ff.b...@gmail.com 2014-02-08 
15:04:51 PST ---
(In reply to comment #7)
 A quick search lead me to this issue. It would appear the deadlock still
 occurs.
 [...SNIP...]
 The deadlock doesn't happen on every run though, so may be tricky to track
 down.
 It's in this piece of test/shared/src/plugin.d:
 
 23  launchThread();
 24  GC.collect();
 25  joinThread();
 
 GC.collect() simply doesn't return. I haven't investigated deeper yet. Maybe 
 it
 has something to do with GC trying to pause/resume an exiting/finished thread?
 This is on 64-bit Linux.

Does the code in the first post deadlock for you?
If not then issues #11981 / #10351 also look relevant.

For #11981 / #10351, we REALLY need a way to reproduce the deadlock, along with
information about the system it is running on (glibc version, linux kernel
version, etc, as much as we can get.)

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2014-02-08 Thread d-bugmail
https://d.puremagic.com/issues/show_bug.cgi?id=4890



--- Comment #9 from Stanislav Blinov stanislav.bli...@gmail.com 2014-02-08 
15:09:32 PST ---
(In reply to comment #8)

 Does the code in the first post deadlock for you?

No it doesn't.

 If not then issues #11981 / #10351 also look relevant.
 
 For #11981 / #10351, we REALLY need a way to reproduce the deadlock, along 
 with
 information about the system it is running on (glibc version, linux kernel
 version, etc, as much as we can get.)

I'll go look at those issues then.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2011-09-06 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890


Sean Kelly s...@invisibleduck.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


--- Comment #6 from Sean Kelly s...@invisibleduck.org 2011-09-06 11:39:15 PDT 
---
A thread will be added to the global thread list before its TLS range is set,
but the range will be set before the thread ever actually uses TLS data.  I
think this one can be closed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2011-07-11 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890


Jakob Bornecrantz wallbra...@gmail.com changed:

   What|Removed |Added

 CC||wallbra...@gmail.com


--- Comment #5 from Jakob Bornecrantz wallbra...@gmail.com 2011-07-11 
18:01:31 PDT ---
This looks fixed with 2.054 on MacOSX, at least I can repro this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2011-01-24 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890


Steven Schveighoffer schvei...@yahoo.com changed:

   What|Removed |Added

 CC||schvei...@yahoo.com


--- Comment #4 from Steven Schveighoffer schvei...@yahoo.com 2011-01-24 
06:03:14 PST ---
(In reply to comment #3)
 I've also stumbled over the racing condition in thread_processGCMarks() where 
 a
 thread was already added to the global thread list but didn't had it's m_tls
 set yet. It seems fine to test for m_tls being null at that specific place.

That's something that I recently added.

Sean, can you confirm that if a thread's m_tls is not yet set, then it's actual
TLS can not have been used yet?  It seems reasonable to check the tls block for
null at that point.

(will have to start using github soon...)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2011-01-21 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890


d...@dawgfoto.de changed:

   What|Removed |Added

 CC||d...@dawgfoto.de


--- Comment #3 from d...@dawgfoto.de 2011-01-21 15:12:13 PST ---
I've also stumbled over the racing condition in thread_processGCMarks() where a
thread was already added to the global thread list but didn't had it's m_tls
set yet. It seems fine to test for m_tls being null at that specific place.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2011-01-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890



--- Comment #2 from Sean Kelly s...@invisibleduck.org 2011-01-04 13:41:41 PST 
---
It turns out that the fix I applied produces a race condition with the GC. 
I'll have to re-wrap Thread.start() in a synchronized block as per the code
prior to rev 392.  This may re-introduce the deadlock, in which case it will be
necessary to replace the isRunning flag with a state field that distinguishes
starting from running.  A starting thread should be suspended/resumed but not
scanned.  Or perhaps something else can be sorted out to deal with a thread
being in the list that doesn't have its TLS section set, getThis() doesn't
work, etc.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 4890] GC.collect() deadlocks multithreaded program.

2010-09-21 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=4890


Sean Kelly s...@invisibleduck.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||s...@invisibleduck.org


--- Comment #1 from Sean Kelly s...@invisibleduck.org 2010-09-21 11:35:29 PDT 
---
Fixed in druntime changeset 392.  Will be in DMD-2.050.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---