Bug#575534: libc6: periodic timers hang fork()

2010-05-20 Thread Joachim Breitner
Hi,

Am Mittwoch, den 19.05.2010, 20:47 +0200 schrieb Aurelien Jarno:
 I don't think it is possible to disable the timers while running clone
 in the libc side. POSIX 2008 is explicit about how signals should be
 delivered, and doing that would create a small window of time that 
 violates POSIX 2008.
 
 I guess the kernel behaviour of restarting the clone syscall in that
 case is also to not violate POSIX 2008.

the ghc compiler was now patched to disable the timers around
fork()/clone(). I did not yet test whether this fixed the problems we
observe on the buildds.

Thomas, in your case, I guess the bug needs to be reassigned against
gcc, as it might be an error in the profiling code created by gcc.

Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Bug#575534: libc6: periodic timers hang fork()

2010-05-20 Thread Tomas Janousek
Hello,

On Thu, May 20, 2010 at 12:17:51PM +0200, Joachim Breitner wrote:
 Thomas, in your case, I guess the bug needs to be reassigned against
 gcc, as it might be an error in the profiling code created by gcc.

My only motivation was GHC, so I'm happy now, thanks. Should anyone else need
it fixed for gcc as well, feel free to reassign and contact me with questions
about the test case if needed.

(Is it intentional that the only e-mail I got about this bugreport is this one
where you explicitly Cc'd me? Do you guys get e-mail from Debian BTS? Shall I
report it to the bts clerks or is this a PEBKAC on my side?)

Regards,
-- 
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575534: libc6: periodic timers hang fork()

2010-05-19 Thread Aurelien Jarno
On Sat, May 15, 2010 at 10:29:13PM +0200, Joachim Breitner wrote:
 Hi Aruelien,
 
 Am Samstag, den 15.05.2010, 12:28 +0200 schrieb Aurelien Jarno:
  On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote:
   I’d like to follow up on this issue. According to the libc bug reporting
   guidelines, one should hear from the Debian maintainers whether the bug
   could possibly be Debian-specific before reporting them upstream. Can
   you comment on that?
   
   The bug might look like it is a weird corner case, but it causes serious
   trouble with building large haskell packages on some arches, and thus
   the transition of such packages to testing.
   
  
  I am not sure it is GNU libc bug, and I am not sure there is an
  acceptable solution. The timer interrupts the clone syscall itself, so
  everything happens in the kernel.
  
  Maybe stopping the timer before the clone syscall and restoring it
  after is something acceptable?
 
 I’m by far not an expert in these areas, but seeing the same problem
 happen with different uses (profiling a C binary, running Haskell code)
 seems that the problem is either in the shared code (the libc), or in
 the usage of it.
 
 In the example C code with profiling, if the bug is not a libc bug, then
 it is a bug in the compiler generating the profiling code? Or should it
 be libc’s responsibility to disable timers while running clone? Is that
 even possible? Would this cause other problems?
 

I don't think it is possible to disable the timers while running clone
in the libc side. POSIX 2008 is explicit about how signals should be
delivered, and doing that would create a small window of time that 
violates POSIX 2008.

I guess the kernel behaviour of restarting the clone syscall in that
case is also to not violate POSIX 2008.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575534: libc6: periodic timers hang fork()

2010-05-15 Thread Aurelien Jarno
On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote:
 Hi,
 
 I’d like to follow up on this issue. According to the libc bug reporting
 guidelines, one should hear from the Debian maintainers whether the bug
 could possibly be Debian-specific before reporting them upstream. Can
 you comment on that?
 
 The bug might look like it is a weird corner case, but it causes serious
 trouble with building large haskell packages on some arches, and thus
 the transition of such packages to testing.
 

I am not sure it is GNU libc bug, and I am not sure there is an
acceptable solution. The timer interrupts the clone syscall itself, so
everything happens in the kernel.

Maybe stopping the timer before the clone syscall and restoring it
after is something acceptable?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575534: libc6: periodic timers hang fork()

2010-05-15 Thread Joachim Breitner
Hi Aruelien,

Am Samstag, den 15.05.2010, 12:28 +0200 schrieb Aurelien Jarno:
 On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote:
  I’d like to follow up on this issue. According to the libc bug reporting
  guidelines, one should hear from the Debian maintainers whether the bug
  could possibly be Debian-specific before reporting them upstream. Can
  you comment on that?
  
  The bug might look like it is a weird corner case, but it causes serious
  trouble with building large haskell packages on some arches, and thus
  the transition of such packages to testing.
  
 
 I am not sure it is GNU libc bug, and I am not sure there is an
 acceptable solution. The timer interrupts the clone syscall itself, so
 everything happens in the kernel.
 
 Maybe stopping the timer before the clone syscall and restoring it
 after is something acceptable?

I’m by far not an expert in these areas, but seeing the same problem
happen with different uses (profiling a C binary, running Haskell code)
seems that the problem is either in the shared code (the libc), or in
the usage of it.

In the example C code with profiling, if the bug is not a libc bug, then
it is a bug in the compiler generating the profiling code? Or should it
be libc’s responsibility to disable timers while running clone? Is that
even possible? Would this cause other problems?

I created a ticket against ghc, the haskell compiler, and asked if they
think they should and can disable the timer as well:


Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Bug#575534: libc6: periodic timers hang fork()

2010-05-14 Thread Joachim Breitner
Hi,

I’d like to follow up on this issue. According to the libc bug reporting
guidelines, one should hear from the Debian maintainers whether the bug
could possibly be Debian-specific before reporting them upstream. Can
you comment on that?

The bug might look like it is a weird corner case, but it causes serious
trouble with building large haskell packages on some arches, and thus
the transition of such packages to testing.

Thanks,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Bug#575534: libc6: periodic timers hang fork()

2010-03-26 Thread Tomas Janousek
Package: libc6
Version: 2.10.2-6
Severity: normal

Whenever a program uses some kind of periodic timer (profiling, Haskell
thread scheduler, ...) and makes a fork call that takes more time than the
timer's interval, it enters and endless loop of clone syscalls being
interrupted by the signal:

17:08:49.362528 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) 
= ? ERESTARTNOINTR (To be restarted) 0.031840
17:08:49.394551 --- SIGPROF (Profiling timer expired) @ 0 (0) ---
17:08:49.394693 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) 
= ? ERESTARTNOINTR (To be restarted) 0.031548
17:08:49.426335 --- SIGPROF (Profiling timer expired) @ 0 (0) ---
17:08:49.426475 clone(child_stack=0, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) 
= ? ERESTARTNOINTR (To be restarted) 0.031768
17:08:49.458339 --- SIGPROF (Profiling timer expired) @ 0 (0) ---
[... and so on ...]

On my hardware, this happens for programs using more than 400 megs of memory
(there's a linear dependence between memory used and the duration of clone).
That means that any program that uses more than 400 megs of memory can't use
fork and be profiled at the same time.

A reproducer is at http://store.lisk.in/tmp/perm/fork_profiling_hang.tar.gz.
`make all' generates a graph of the said dependence and `make hang' launches a
test case that consumes 500 megs of memory and forks then. If that's not
enough, just raise the number in the Makefile.

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (990, 'testing'), (990, 'stable'), (500, 'unstable'), (500, 
'stable'), (200, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.32.7-lis-2-gd18ac29 (SMP w/2 CPU cores)
Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libc-bin  2.10.2-6   Embedded GNU C Library: Binaries
ii  libgcc1   1:4.4.1-4  GCC support library

Versions of packages libc6 recommends:
ii  libc6-i6862.10.2-6   GNU C Library: Shared libraries [i

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0] 1.5.27 Debian configuration management sy
ii  glibc-doc 2.10.2-6   Embedded GNU C Library: Documentat
ii  locales   2.10.2-6   Embedded GNU C Library: National L

-- debconf information:
* glibc/upgrade: true
  glibc/disable-screensaver:
  glibc/restart-failed:
* glibc/restart-services:

-- 
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org