Bug#575534: libc6: periodic timers hang fork()
Hi, Am Mittwoch, den 19.05.2010, 20:47 +0200 schrieb Aurelien Jarno: I don't think it is possible to disable the timers while running clone in the libc side. POSIX 2008 is explicit about how signals should be delivered, and doing that would create a small window of time that violates POSIX 2008. I guess the kernel behaviour of restarting the clone syscall in that case is also to not violate POSIX 2008. the ghc compiler was now patched to disable the timers around fork()/clone(). I did not yet test whether this fixed the problems we observe on the buildds. Thomas, in your case, I guess the bug needs to be reassigned against gcc, as it might be an error in the profiling code created by gcc. Greetings, Joachim -- Joachim nomeata Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Bug#575534: libc6: periodic timers hang fork()
Hello, On Thu, May 20, 2010 at 12:17:51PM +0200, Joachim Breitner wrote: Thomas, in your case, I guess the bug needs to be reassigned against gcc, as it might be an error in the profiling code created by gcc. My only motivation was GHC, so I'm happy now, thanks. Should anyone else need it fixed for gcc as well, feel free to reassign and contact me with questions about the test case if needed. (Is it intentional that the only e-mail I got about this bugreport is this one where you explicitly Cc'd me? Do you guys get e-mail from Debian BTS? Shall I report it to the bts clerks or is this a PEBKAC on my side?) Regards, -- Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#575534: libc6: periodic timers hang fork()
On Sat, May 15, 2010 at 10:29:13PM +0200, Joachim Breitner wrote: Hi Aruelien, Am Samstag, den 15.05.2010, 12:28 +0200 schrieb Aurelien Jarno: On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote: I’d like to follow up on this issue. According to the libc bug reporting guidelines, one should hear from the Debian maintainers whether the bug could possibly be Debian-specific before reporting them upstream. Can you comment on that? The bug might look like it is a weird corner case, but it causes serious trouble with building large haskell packages on some arches, and thus the transition of such packages to testing. I am not sure it is GNU libc bug, and I am not sure there is an acceptable solution. The timer interrupts the clone syscall itself, so everything happens in the kernel. Maybe stopping the timer before the clone syscall and restoring it after is something acceptable? I’m by far not an expert in these areas, but seeing the same problem happen with different uses (profiling a C binary, running Haskell code) seems that the problem is either in the shared code (the libc), or in the usage of it. In the example C code with profiling, if the bug is not a libc bug, then it is a bug in the compiler generating the profiling code? Or should it be libc’s responsibility to disable timers while running clone? Is that even possible? Would this cause other problems? I don't think it is possible to disable the timers while running clone in the libc side. POSIX 2008 is explicit about how signals should be delivered, and doing that would create a small window of time that violates POSIX 2008. I guess the kernel behaviour of restarting the clone syscall in that case is also to not violate POSIX 2008. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#575534: libc6: periodic timers hang fork()
On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote: Hi, I’d like to follow up on this issue. According to the libc bug reporting guidelines, one should hear from the Debian maintainers whether the bug could possibly be Debian-specific before reporting them upstream. Can you comment on that? The bug might look like it is a weird corner case, but it causes serious trouble with building large haskell packages on some arches, and thus the transition of such packages to testing. I am not sure it is GNU libc bug, and I am not sure there is an acceptable solution. The timer interrupts the clone syscall itself, so everything happens in the kernel. Maybe stopping the timer before the clone syscall and restoring it after is something acceptable? -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#575534: libc6: periodic timers hang fork()
Hi Aruelien, Am Samstag, den 15.05.2010, 12:28 +0200 schrieb Aurelien Jarno: On Fri, May 14, 2010 at 10:38:17PM +0200, Joachim Breitner wrote: I’d like to follow up on this issue. According to the libc bug reporting guidelines, one should hear from the Debian maintainers whether the bug could possibly be Debian-specific before reporting them upstream. Can you comment on that? The bug might look like it is a weird corner case, but it causes serious trouble with building large haskell packages on some arches, and thus the transition of such packages to testing. I am not sure it is GNU libc bug, and I am not sure there is an acceptable solution. The timer interrupts the clone syscall itself, so everything happens in the kernel. Maybe stopping the timer before the clone syscall and restoring it after is something acceptable? I’m by far not an expert in these areas, but seeing the same problem happen with different uses (profiling a C binary, running Haskell code) seems that the problem is either in the shared code (the libc), or in the usage of it. In the example C code with profiling, if the bug is not a libc bug, then it is a bug in the compiler generating the profiling code? Or should it be libc’s responsibility to disable timers while running clone? Is that even possible? Would this cause other problems? I created a ticket against ghc, the haskell compiler, and asked if they think they should and can disable the timer as well: Greetings, Joachim -- Joachim nomeata Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Bug#575534: libc6: periodic timers hang fork()
Hi, I’d like to follow up on this issue. According to the libc bug reporting guidelines, one should hear from the Debian maintainers whether the bug could possibly be Debian-specific before reporting them upstream. Can you comment on that? The bug might look like it is a weird corner case, but it causes serious trouble with building large haskell packages on some arches, and thus the transition of such packages to testing. Thanks, Joachim -- Joachim nomeata Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Bug#575534: libc6: periodic timers hang fork()
Package: libc6 Version: 2.10.2-6 Severity: normal Whenever a program uses some kind of periodic timer (profiling, Haskell thread scheduler, ...) and makes a fork call that takes more time than the timer's interval, it enters and endless loop of clone syscalls being interrupted by the signal: 17:08:49.362528 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) = ? ERESTARTNOINTR (To be restarted) 0.031840 17:08:49.394551 --- SIGPROF (Profiling timer expired) @ 0 (0) --- 17:08:49.394693 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) = ? ERESTARTNOINTR (To be restarted) 0.031548 17:08:49.426335 --- SIGPROF (Profiling timer expired) @ 0 (0) --- 17:08:49.426475 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb765f938) = ? ERESTARTNOINTR (To be restarted) 0.031768 17:08:49.458339 --- SIGPROF (Profiling timer expired) @ 0 (0) --- [... and so on ...] On my hardware, this happens for programs using more than 400 megs of memory (there's a linear dependence between memory used and the duration of clone). That means that any program that uses more than 400 megs of memory can't use fork and be profiled at the same time. A reproducer is at http://store.lisk.in/tmp/perm/fork_profiling_hang.tar.gz. `make all' generates a graph of the said dependence and `make hang' launches a test case that consumes 500 megs of memory and forks then. If that's not enough, just raise the number in the Makefile. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (990, 'testing'), (990, 'stable'), (500, 'unstable'), (500, 'stable'), (200, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.32.7-lis-2-gd18ac29 (SMP w/2 CPU cores) Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libc6 depends on: ii libc-bin 2.10.2-6 Embedded GNU C Library: Binaries ii libgcc1 1:4.4.1-4 GCC support library Versions of packages libc6 recommends: ii libc6-i6862.10.2-6 GNU C Library: Shared libraries [i Versions of packages libc6 suggests: ii debconf [debconf-2.0] 1.5.27 Debian configuration management sy ii glibc-doc 2.10.2-6 Embedded GNU C Library: Documentat ii locales 2.10.2-6 Embedded GNU C Library: National L -- debconf information: * glibc/upgrade: true glibc/disable-screensaver: glibc/restart-failed: * glibc/restart-services: -- Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org