Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter
On Fri, Jul 11, 2008 at 1:02 AM, Jeff Dike <[EMAIL PROTECTED]> wrote: > On Thu, Jul 10, 2008 at 10:25:29AM +0800, Jiaying Zhang wrote: > > Do you have any thought about what the problem might be? > > Thanks a lot! > > Yeah, my first thought is that your code is buggy. > > Since 2.6.25 seems OK, you can bisect between then and now to see > either what caused the bug or what is triggering the crash on an > existing bug. The 2.6.24 kernels are OK, but I have seen this problem with all of the 2.6.25 kernels I have tried. There have been a lot of changes between 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead to this problem. > The other thing you can do is gdb the UML and see if gdb gives you a > better stack trace. > Here is the trace from gdb uml. Program received signal SIGTERM, Terminated. 0xb7fff410 in ?? () (gdb) bt #0 0xb7fff410 in ?? () #1 0x08323afc in cpu0_irqstack () #2 0xfffe in ?? () #3 0x000f in ?? () #4 0x464850c6 in kill () from /lib/tls/i686/cmov/libc.so.6 #5 0x0806624b in os_dump_core () at arch/um/os-Linux/util.c:92 #6 0x08059703 in panic_exit (self=0x83254f4, unused1=0, unused2=0x8340a80) at arch/um/kernel/um_arch.c:233 #7 0x080849d0 in notifier_call_chain (nl=0x0, val=0, v=0x8340a80, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:70 #8 0x08084a72 in __atomic_notifier_call_chain (nh=0x8340a60, val=0, v=0x8340a80, nr_to_call=-1, nr_calls=0x0) at kernel/notifier.c:159 #9 0x08084a89 in atomic_notifier_call_chain (nh=0x8340a60, val=0, v=0x8340a80) at kernel/notifier.c:168 #10 0x0807116f in panic (fmt=0x82d2039 "Kernel mode fault at addr 0x%lx, ip 0x%lx") at kernel/panic.c:101 #11 0x080594c1 in segv (fi={error_code = 6, cr2 = 98596, trap_no = 14}, ip=136845739, is_user=0, regs=0x8323c6c) at arch/um/kernel/trap.c:206 #12 0x080592a0 in segv_handler (sig=11, regs=0x8323c6c) at arch/um/kernel/trap.c:152 #13 0x0806537b in sig_handler_common (sig=11, sc=0x8323d24) at arch/um/os-Linux/signal.c:48 #14 0x080653b8 in sig_handler (sig=11, sc=0x8323d24) at arch/um/os-Linux/signal.c:80 #15 0x080654dd in handle_signal (sig=, sc=0x8323d24) at arch/um/os-Linux/signal.c:157 #16 0x08066ebf in hard_handler (sig=11) at arch/um/os-Linux/sys-i386/signal.c:12 #17 #18 __down_interruptible (sem=0x9f68978) at include/linux/list.h:50 #19 0x0828091a in __down_failed_interruptible () at arch/um/sys-i386/../../x86/lib/semaphore_32.S:63 #20 0x08220a89 in ddsnap_create (target=0xa829080, argc=4, argv=0x9f6f290) at include/asm/arch/semaphore_32.h:120 #21 0x0821b160 in dm_table_add_target (t=0x9f6f178, type=0xa82414c "ddsnap", start=165497564, len=204800, params=0xa82415c "/dev/ubdc") at drivers/md/dm-table.c:772 Looks like the problem happens when __down_interruptible is called. I checked the semaphore passed to __down_interruptible under gdb and found it was corrupted: (gdb) f 18 #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 50 prev->next = new; (gdb) p sem $15 = (struct semaphore *) 0x9f68d08 (gdb) p *sem $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = {raw_lock = {}}, task_list = { next = 0x9f68d5c, prev = 0x18124}}} But the semaphore looks correct before calling down_interruptible: (gdb) f 20 #20 0x082209fd in ddsnap_create (target=0xa829080, argc=4, argv=0x9f733a8) at include/asm/arch/semaphore_32.h:120 120 __asm__ __volatile__( (gdb) p info->identify_sem $28 = {count = {counter = -1}, sleepers = 0, wait = {lock = {raw_lock = {}}, task_list = { next = 0x9f0ca14, prev = 0x9f0ca14}}} I found from 2.6.25 kernel, the type of __down_failed_interruptible changed from fastcall to extern asmregparm. Can it be related to this problem? Jiaying - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter
On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: > The 2.6.24 kernels are OK, but I have seen this problem with all of the > 2.6.25 kernels I have tried. There have been a lot of changes between > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead > to this problem. So bisect it. > Looks like the problem happens when __down_interruptible is called. > I checked the semaphore passed to __down_interruptible under gdb > and found it was corrupted: > (gdb) f 18 > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 > 50 prev->next = new; > (gdb) p sem > $15 = (struct semaphore *) 0x9f68d08 > (gdb) p *sem > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = > {raw_lock = {}}, task_list = { > next = 0x9f68d5c, prev = 0x18124}}} > > But the semaphore looks correct before calling down_interruptible: What's the problem with debugging this, then? You step through the code starting when the semaphore is good and see exactly when it gets corrupted. Jeff -- Work email - jdike at linux dot intel dot com - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
Re: [uml-devel] [PATCH] fix extern inline errors with gcc 4.3.0
I tried to build a UML 2.6.26 kernel on Debian (32-bit) and ran into the same problem reported here http://marc.info/?t=12101153352&r=1&w=2. If I use gcc 4.3 everything works fine, but if I use gcc 4.1 or 4.2, UML crashes on startup. I changed the gcc version check to 0430 and both 4.1 and 4.2 compiled kernels worked fine. Also, is there any chance of getting a SKAS4 patch against 2.6.26? I was trying to test it against 2.6.25.10 and had some strange issues. I was running Debian Etch inside VMWare with a 2.6.25.10 SKAS4-enabled kernel. When I tried to start up a UML instance from root or a non-root account I was getting the same problem reported here.. http://thread.gmane.org/gmane.linux.uml.user/13045/focus=13059. Now the weird part is if I started UML via sudo from a non-root account I'd get a segfault somewhere in libc but then UML would boot up into a working Linux environment saying SKAS4 was enabled. Was hoping to try and test it some more on a newer kernel if there's a patch available. Benny Halevy wrote: > gcc 4.3.0 needs -funit-at-a-time for extern inline functions > otherwise it doesn't find their body. > > For example: > $ gcc --version > gcc (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8) > > /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c: In function > ‘alloc_page_buffers’: > /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c:51: sorry, > unimplemented: inlining failed in call to ‘init_buffer’: function body not > available > /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c:948: sorry, > unimplemented: called from here > > Fix follows the lines of commit 22eecde2f9034764a3fd095eecfa3adfb8ec9a98 > that was reverted by commit c0a18111e571138747a98af18b3a2124df56a0d1, > just limiting the flag for pre- gcc 4.3.0 rather than 4.0. > > Signed-off-by: Benny Halevy <[EMAIL PROTECTED]> > --- > arch/um/Makefile |6 +- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch/um/Makefile b/arch/um/Makefile > index dbeab15..e7ed37b 100644 > --- a/arch/um/Makefile > +++ b/arch/um/Makefile > @@ -77,7 +77,11 @@ include $(srctree)/$(ARCH_DIR)/Makefile-os-$(OS) > KERNEL_DEFINES = $(strip -Derrno=kernel_errno > -Dsigprocmask=kernel_sigprocmask \ >-Dmktime=kernel_mktime $(ARCH_KERNEL_DEFINES)) > KBUILD_CFLAGS += $(KERNEL_DEFINES) > -KBUILD_CFLAGS += $(call cc-option,-fno-unit-at-a-time,) > +# Disable unit-at-a-time mode on pre-gcc-4.3 compilers, it makes gcc use > +# a lot more stack due to the lack of sharing of stacklots: > +# gcc 4.3.0 needs -funit-at-a-time for extern inline functions > +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -lt 0403 ] ; then \ > + echo $(call cc-option,-fno-unit-at-a-time); fi ;) > > PHONY += linux > - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel