Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter

2008-07-14 Thread Jiaying Zhang
On Fri, Jul 11, 2008 at 1:02 AM, Jeff Dike <[EMAIL PROTECTED]> wrote:

> On Thu, Jul 10, 2008 at 10:25:29AM +0800, Jiaying Zhang wrote:
> > Do you have any thought about what the problem might be?
> > Thanks a lot!
>
> Yeah, my first thought is that your code is buggy.
>
> Since 2.6.25 seems OK, you can bisect between then and now to see
> either what caused the bug or what is triggering the crash on an
> existing bug.


The 2.6.24 kernels are OK, but I have seen this problem with all of the
2.6.25 kernels I have tried. There have been a lot of changes between
2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
to this problem.


> The other thing you can do is gdb the UML and see if gdb gives you a
> better stack trace.
>

Here is the trace from gdb uml.

Program received signal SIGTERM, Terminated.
0xb7fff410 in ?? ()
(gdb) bt
#0  0xb7fff410 in ?? ()
#1  0x08323afc in cpu0_irqstack ()
#2  0xfffe in ?? ()
#3  0x000f in ?? ()
#4  0x464850c6 in kill () from /lib/tls/i686/cmov/libc.so.6
#5  0x0806624b in os_dump_core () at arch/um/os-Linux/util.c:92
#6  0x08059703 in panic_exit (self=0x83254f4, unused1=0, unused2=0x8340a80)
at arch/um/kernel/um_arch.c:233
#7  0x080849d0 in notifier_call_chain (nl=0x0, val=0, v=0x8340a80,
nr_to_call=0, nr_calls=0x0)
at kernel/notifier.c:70
#8  0x08084a72 in __atomic_notifier_call_chain (nh=0x8340a60, val=0,
v=0x8340a80, nr_to_call=-1,
nr_calls=0x0) at kernel/notifier.c:159
#9  0x08084a89 in atomic_notifier_call_chain (nh=0x8340a60, val=0,
v=0x8340a80) at kernel/notifier.c:168
#10 0x0807116f in panic (fmt=0x82d2039 "Kernel mode fault at addr 0x%lx, ip
0x%lx") at kernel/panic.c:101
#11 0x080594c1 in segv (fi={error_code = 6, cr2 = 98596, trap_no = 14},
ip=136845739, is_user=0,
regs=0x8323c6c) at arch/um/kernel/trap.c:206
#12 0x080592a0 in segv_handler (sig=11, regs=0x8323c6c) at
arch/um/kernel/trap.c:152
#13 0x0806537b in sig_handler_common (sig=11, sc=0x8323d24) at
arch/um/os-Linux/signal.c:48
#14 0x080653b8 in sig_handler (sig=11, sc=0x8323d24) at
arch/um/os-Linux/signal.c:80
#15 0x080654dd in handle_signal (sig=, sc=0x8323d24) at
arch/um/os-Linux/signal.c:157
#16 0x08066ebf in hard_handler (sig=11) at
arch/um/os-Linux/sys-i386/signal.c:12
#17 
#18 __down_interruptible (sem=0x9f68978) at include/linux/list.h:50
#19 0x0828091a in __down_failed_interruptible () at
arch/um/sys-i386/../../x86/lib/semaphore_32.S:63
#20 0x08220a89 in ddsnap_create (target=0xa829080, argc=4, argv=0x9f6f290)
at include/asm/arch/semaphore_32.h:120
#21 0x0821b160 in dm_table_add_target (t=0x9f6f178, type=0xa82414c "ddsnap",
start=165497564, len=204800,
params=0xa82415c "/dev/ubdc") at drivers/md/dm-table.c:772

Looks like the problem happens when __down_interruptible is called.
I checked the semaphore passed to __down_interruptible under gdb
and found it was corrupted:
(gdb) f 18
#18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
50  prev->next = new;
(gdb) p sem
$15 = (struct semaphore *) 0x9f68d08
(gdb) p *sem
$16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
{raw_lock = {}}, task_list = {
  next = 0x9f68d5c, prev = 0x18124}}}

But the semaphore looks correct before calling down_interruptible:
(gdb) f 20
#20 0x082209fd in ddsnap_create (target=0xa829080, argc=4, argv=0x9f733a8)
at include/asm/arch/semaphore_32.h:120
120 __asm__ __volatile__(
(gdb) p info->identify_sem
$28 = {count = {counter = -1}, sleepers = 0, wait = {lock = {raw_lock = {}}, task_list = {
  next = 0x9f0ca14, prev = 0x9f0ca14}}}

I found from 2.6.25 kernel, the type of __down_failed_interruptible changed
from fastcall to extern asmregparm.
Can it be related to this problem?

Jiaying
-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter

2008-07-14 Thread Jeff Dike
On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote:
> The 2.6.24 kernels are OK, but I have seen this problem with all of the
> 2.6.25 kernels I have tried. There have been a lot of changes between
> 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
> to this problem.

So bisect it.

> Looks like the problem happens when __down_interruptible is called.
> I checked the semaphore passed to __down_interruptible under gdb
> and found it was corrupted:
> (gdb) f 18
> #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
> 50  prev->next = new;
> (gdb) p sem
> $15 = (struct semaphore *) 0x9f68d08
> (gdb) p *sem
> $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
> {raw_lock = {}}, task_list = {
>   next = 0x9f68d5c, prev = 0x18124}}}
> 
> But the semaphore looks correct before calling down_interruptible:

What's the problem with debugging this, then?  You step through the
code starting when the semaphore is good and see exactly when it gets
corrupted.

Jeff

-- 
Work email - jdike at linux dot intel dot com

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] [PATCH] fix extern inline errors with gcc 4.3.0

2008-07-14 Thread David Shane Holden
I tried to build a UML 2.6.26 kernel on Debian (32-bit) and ran into the 
same problem reported here http://marc.info/?t=12101153352&r=1&w=2. 
  If I use gcc 4.3 everything works fine, but if I use gcc 4.1 or 4.2, 
UML crashes on startup.  I changed the gcc version check to 0430 and 
both 4.1 and 4.2 compiled kernels worked fine.

Also, is there any chance of getting a SKAS4 patch against 2.6.26?  I 
was trying to test it against 2.6.25.10 and had some strange issues.  I 
was running Debian Etch inside VMWare with a 2.6.25.10 SKAS4-enabled 
kernel.  When I tried to start up a UML instance from root or a non-root 
account I was getting the same problem reported here.. 
http://thread.gmane.org/gmane.linux.uml.user/13045/focus=13059.  Now the 
weird part is if I started UML via sudo from a non-root account I'd get 
a segfault somewhere in libc but then UML would boot up into a working 
Linux environment saying SKAS4 was enabled.  Was hoping to try and test 
it some more on a newer kernel if there's a patch available.

Benny Halevy wrote:
> gcc 4.3.0 needs -funit-at-a-time for extern inline functions
> otherwise it doesn't find their body.
> 
> For example:
> $ gcc --version
> gcc (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8)
> 
> /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c: In function 
> ‘alloc_page_buffers’:
> /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c:51: sorry, 
> unimplemented: inlining failed in call to ‘init_buffer’: function body not 
> available
> /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/buffer.c:948: sorry, 
> unimplemented: called from here
> 
> Fix follows the lines of commit 22eecde2f9034764a3fd095eecfa3adfb8ec9a98
> that was reverted by commit c0a18111e571138747a98af18b3a2124df56a0d1,
> just limiting the flag for pre- gcc 4.3.0 rather than 4.0.
> 
> Signed-off-by: Benny Halevy <[EMAIL PROTECTED]>
> ---
>  arch/um/Makefile |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/um/Makefile b/arch/um/Makefile
> index dbeab15..e7ed37b 100644
> --- a/arch/um/Makefile
> +++ b/arch/um/Makefile
> @@ -77,7 +77,11 @@ include $(srctree)/$(ARCH_DIR)/Makefile-os-$(OS)
>  KERNEL_DEFINES = $(strip -Derrno=kernel_errno 
> -Dsigprocmask=kernel_sigprocmask \
>-Dmktime=kernel_mktime $(ARCH_KERNEL_DEFINES))
>  KBUILD_CFLAGS += $(KERNEL_DEFINES)
> -KBUILD_CFLAGS += $(call cc-option,-fno-unit-at-a-time,)
> +# Disable unit-at-a-time mode on pre-gcc-4.3 compilers, it makes gcc use
> +# a lot more stack due to the lack of sharing of stacklots:
> +# gcc 4.3.0 needs -funit-at-a-time for extern inline functions
> +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -lt 0403 ] ; then \
> + echo $(call cc-option,-fno-unit-at-a-time); fi ;)
>  
>  PHONY += linux
>  


-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel