Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-20 Thread Petr Salinger

That's really nice.  Petr, could you give some explanation on that
one-line patch you provided?  Is it supposed to be the correct fix or
is more work necessary?  I'm not familiar with the whole picture but
if you give some pointers I may be able to help.


In the original (plain linuxthreads) code, with thread implemented as 
freebsd process, the wakeup signal is sent to thread manager from kernel, 
after exit of thread.


In current variant, with thread implemented as freebsd kernel thread,
the wakeup signal is sent to thread manager from userspace, a few moments 
before exit. It is an expected race condition. It is also the reason, 
why || main_thread_exiting have been added. I expected, that loss of a
wakeup does not matter, the child thread will be eaten only slightly 
later, when another thread exits and sends wake up. The only problem 
should be, when there is no another thread, it should be solved by

|| main_thread_exiting. But it does not suffice.

The try eat dead child everytime is just workaround.
The better way might be to add atomic counter
[using gcc's __sync_fetch_and_add()] to track 
the number of expected dead or soon to be dead child

and try to eat dead child when the number is above zero.

And (of course) in long term, do not use manager thread anymore.

Petr



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-20 Thread Robert Millan
tag 654783 pending
thanks

El 20 d’abril de 2012 8:47, Petr Salinger petr.salin...@seznam.cz ha escrit:
 In the original (plain linuxthreads) code, with thread implemented as
 freebsd process, the wakeup signal is sent to thread manager from kernel,
 after exit of thread.

 In current variant, with thread implemented as freebsd kernel thread,
 the wakeup signal is sent to thread manager from userspace, a few moments
 before exit. It is an expected race condition. It is also the reason, why
 || main_thread_exiting have been added. I expected, that loss of a
 wakeup does not matter, the child thread will be eaten only slightly
 later, when another thread exits and sends wake up. The only problem should
 be, when there is no another thread, it should be solved by
 || main_thread_exiting. But it does not suffice.

 The try eat dead child everytime is just workaround.

Yep, eating dead children everytime doesn't sound like the cleanest
option to me either ;-)

 The better way might be to add atomic counter
 [using gcc's __sync_fetch_and_add()] to track the number of expected dead
 or soon to be dead child
 and try to eat dead child when the number is above zero.

Thanks for the heads-up.  I notice you already fixed this in pkg-glibc
SVN.  Maybe it's not worth improving further... (IMHO time would be
better spent on NPTL).

Thank you!

-- 
Robert Millan



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#663056: Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-19 Thread Robert Millan
CCing #663056

El 19 d’abril de 2012 1:12, Steven Chamberlain ste...@pyro.eu.org ha escrit:
 For now I still have Petr's change applied.  I notice that libsoup2.4's
 connection-test (see #663056) has stopped failing.  (Just had 100/100
 test passes, was previously seeing about 50% failures.)

Are you sure?  You mean you tried 100 times?

I don't know about connection-test, but context-test was a race
condition.  I'm also 100% sure Petr's change doesn't fix that (the
reason for connection-test failure is well-known).

After fixing context-test I got a connection-test pass, but I only
tried once (at that time I assumed it was the same issue as
context-test).

-- 
Robert Millan



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575302: Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-19 Thread Robert Millan
CCing #575302

El 19 d’abril de 2012 1:12, Steven Chamberlain ste...@pyro.eu.org ha escrit:
 Also, perhaps related, I got through the (Python-powered) iceweasel
 10.0.3esr test suite for the first time, without hangs (see #575302).
 Maybe this helped.

That's really nice.  Petr, could you give some explanation on that
one-line patch you provided?  Is it supposed to be the correct fix or
is more work necessary?  I'm not familiar with the whole picture but
if you give some pointers I may be able to help.

-- 
Robert Millan



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-19 Thread Steven Chamberlain
On 19/04/12 20:51, Robert Millan wrote:
 CCing #663056
 
 El 19 d’abril de 2012 1:12, Steven Chamberlain ste...@pyro.eu.org ha 
 escrit:
 For now I still have Petr's change applied.  I notice that libsoup2.4's
 connection-test (see #663056) has stopped failing.  (Just had 100/100
 test passes, was previously seeing about 50% failures.)
 
 Are you sure?  You mean you tried 100 times?

It passed 100 times in a row.  And another 100 times just now.  I'm not
sure that Petr's patch is what really fixed it, but I can try to narrow
it down.

You say the cause was well-known...?

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-19 Thread Steven Chamberlain
On 19/04/12 20:54, Robert Millan wrote:
 CCing #575302
 
 El 19 d’abril de 2012 1:12, Steven Chamberlain ste...@pyro.eu.org ha 
 escrit:
 Also, perhaps related, I got through the (Python-powered) iceweasel
 10.0.3esr test suite for the first time, without hangs (see #575302).
 Maybe this helped.
 
 That's really nice.  Petr, could you give some explanation on that
 one-line patch you provided?  Is it supposed to be the correct fix or
 is more work necessary?  I'm not familiar with the whole picture but
 if you give some pointers I may be able to help.

I only thought to test iceweasel because in #658704 you mentioned an
infinite poll() loop (but you didn't show the timing, which you would
get from kdump -T).

Maybe if __pthread_sig_cancel is missed somehow, Petr's diff works
around that by checking anyway for terminated child threads every couple
of seconds.  Just guessing.

Python 2.7.3~rc2 fixed something else, that could have been causing
iceweasel's test harness to hang (like waf in #668240) so that maybe
also helped here.

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#654783: race condition in libpthread causes hangs in python2.7 testsuite

2012-04-18 Thread Steven Chamberlain
On 18/04/12 19:59, Robert Millan wrote:
 El 18 d’abril de 2012 15:46, Steven Chamberlain ste...@pyro.eu.org ha 
 escrit:
 With it, I hit a tst-timer5 regression during build.
 
 Don't worry about tst-timer5, it's a fake regression.  Previously it
 succeeded by exitting without testing anything.

Oh okay.

For now I still have Petr's change applied.  I notice that libsoup2.4's
connection-test (see #663056) has stopped failing.  (Just had 100/100
test passes, was previously seeing about 50% failures.)

Also, perhaps related, I got through the (Python-powered) iceweasel
10.0.3esr test suite for the first time, without hangs (see #575302).
Maybe this helped.

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org