Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeff Dike

[EMAIL PROTECTED] said:
> __restore():  764 
> do_execve:340 
> load_elf_binary:  324
> segv: 180
> sigio_handler:176
> load_script:  172
> ext2_get_block:   160
> set_signals:  156
> block_read_full_page: 124 

There's nothing really fixable here.

> All up, there's about 660 bytes of stack which can be relatively
> easily saved by converting locals to kmalloced memory, which still
> isn't enough to solve the problem. 

Yeah.  I did get rid of the fd_set in sigio_handler, saving another 128 bytes.

Basically, UML is being screwed by the signal frame size.

> I haven't looked into UML's interrupt handling, but perhaps another
> approach is to try and avoid recursive interrupts/exceptions and do
> some kind of tail-recursion optimisation in the exception/signal
> handler.  I don't know if this would cause problems (deadlocks?).

Probably :-)

I don't really want to fiddle with that, because that will impose constraints 
that will hurt responsiveness.

> Alternatively, could you just use a bigger stack? 

*ding ding ding*  We have a winner.

I'm going to allocate 4 pages instead of two, and either have three of them 
available for the stack or have two available for the stack and the third 
unmapped to protect the task structure.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
> Also, could you look at the stack pointer at each frame, to see if you are 
> encountering any stack hogs in the generic kernel?  In a different situation, 
> I found devfs putting a 3K structure on the stack.

OK, top candidates on that stack trace are:

__restore():764
do_execve:  340
load_elf_binary:324
segv:   180
sigio_handler:  176
load_script:172
ext2_get_block: 160
set_signals:156
block_read_full_page:   124

Looks like do_execve should be pretty easy to shrink: most of the stack
is in a local of type struct linux_binprm (308 bytes), which could be
kmalloced.  I guess this would have some cost in speed, so I don't suppose
this could be a generic patch.  Anyway, it isn't a solution in itself.

load_elf_binary is harder to deal with, since it just has lots of locals,
each relatively small.

segv is mostly a local struct siginfo (128 bytes).

sigio_handler is mostly an fd_set (128 bytes).

load_script has a local buffer for remembering the interpreter (128 bytes).

All up, there's about 660 bytes of stack which can be relatively easily
saved by converting locals to kmalloced memory, which still isn't enough
to solve the problem.

I haven't looked into UML's interrupt handling, but perhaps another approach
is to try and avoid recursive interrupts/exceptions and do some kind of
tail-recursion optimisation in the exception/signal handler.  I don't
know if this would cause problems (deadlocks?).

Alternatively, could you just use a bigger stack?

J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
> [EMAIL PROTECTED] said:
> > Even with this patch, the overflow is 808 bytes (without the patch
> > it's 1232 bytes).
> 
> I was mulling over some other changes that would have saved another 256 bytes, 
> but those don't look like they would help.  Try the patch below.  It 
> essentially gives up and lets the stack occupy half of the lower page.

Well, that sweeps the problem under the carpet enough to make progress...
 
> Also, could you look at the stack pointer at each frame, to see if you are 
> encountering any stack hogs in the generic kernel?  In a different situation, 
> I found devfs putting a 3K structure on the stack.

OK, I'll look into it.

J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeff Dike

[EMAIL PROTECTED] said:
> Even with this patch, the overflow is 808 bytes (without the patch
> it's 1232 bytes).

I was mulling over some other changes that would have saved another 256 bytes, 
but those don't look like they would help.  Try the patch below.  It 
essentially gives up and lets the stack occupy half of the lower page.

Also, could you look at the stack pointer at each frame, to see if you are 
encountering any stack hogs in the generic kernel?  In a different situation, 
I found devfs putting a 3K structure on the stack.

Jeff

--- arch/um/kernel/process_kern.c~  Mon Sep 25 15:34:25 2000
+++ arch/um/kernel/process_kern.c   Fri Oct  6 12:07:28 2000
@@ -711,8 +711,13 @@
 
 void check_stack_overflow(void *ptr)
 {
-   ifunsigned long) ptr) & PAGE_MASK) == (unsigned long) current)
-   panic("Stack overflowed onto current_task page");
+   unsigned long addr, c;
+
+   addr = (unsigned long) ptr;
+   c = (unsigned long) current;
+
+   if(addr - c < PAGE_SIZE / 2)
+   panic("Stack overflowed well into the current_task page");
 }
 
 int singlestepping(void *t)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 12:35:48AM -0500, Jeff Dike wrote:
> I've been waiting for someone to send me that stack.  There aren't any real 
> smoking guns there.  I'm guessing that the difference between your laptop and 
> the machine it works on is that your laptop is running a fairly recent kernel 
> (2.4.0-testx) and the other isn't.

Yep, that's right.

> The sigcontext struct greatly increased in 
> size (to ~800 bytes IIRC) to accomodate the MMX registers or something.  There 
> are three signals on your stack, so those frames by themselves are taking up 
> half the stack page.
> 
> Anyway, the patch below removes 256 bytes from the set_signals frame.  It 
> ought to alleviate things a bit.  I'll be looking for other things I can do, 
> as well. Let me know how it works for you.

I'm afraid this doesn't help.  The stack still overflows at the same point.
It looks like each signal frame is ~760 bytes.  Even with this patch, the
overflow is 808 bytes (without the patch it's 1232 bytes).

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 12:35:48AM -0500, Jeff Dike wrote:
 I've been waiting for someone to send me that stack.  There aren't any real 
 smoking guns there.  I'm guessing that the difference between your laptop and 
 the machine it works on is that your laptop is running a fairly recent kernel 
 (2.4.0-testx) and the other isn't.

Yep, that's right.

 The sigcontext struct greatly increased in 
 size (to ~800 bytes IIRC) to accomodate the MMX registers or something.  There 
 are three signals on your stack, so those frames by themselves are taking up 
 half the stack page.
 
 Anyway, the patch below removes 256 bytes from the set_signals frame.  It 
 ought to alleviate things a bit.  I'll be looking for other things I can do, 
 as well. Let me know how it works for you.

I'm afraid this doesn't help.  The stack still overflows at the same point.
It looks like each signal frame is ~760 bytes.  Even with this patch, the
overflow is 808 bytes (without the patch it's 1232 bytes).

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeff Dike

[EMAIL PROTECTED] said:
 Even with this patch, the overflow is 808 bytes (without the patch
 it's 1232 bytes).

I was mulling over some other changes that would have saved another 256 bytes, 
but those don't look like they would help.  Try the patch below.  It 
essentially gives up and lets the stack occupy half of the lower page.

Also, could you look at the stack pointer at each frame, to see if you are 
encountering any stack hogs in the generic kernel?  In a different situation, 
I found devfs putting a 3K structure on the stack.

Jeff

--- arch/um/kernel/process_kern.c~  Mon Sep 25 15:34:25 2000
+++ arch/um/kernel/process_kern.c   Fri Oct  6 12:07:28 2000
@@ -711,8 +711,13 @@
 
 void check_stack_overflow(void *ptr)
 {
-   ifunsigned long) ptr)  PAGE_MASK) == (unsigned long) current)
-   panic("Stack overflowed onto current_task page");
+   unsigned long addr, c;
+
+   addr = (unsigned long) ptr;
+   c = (unsigned long) current;
+
+   if(addr - c  PAGE_SIZE / 2)
+   panic("Stack overflowed well into the current_task page");
 }
 
 int singlestepping(void *t)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
 [EMAIL PROTECTED] said:
  Even with this patch, the overflow is 808 bytes (without the patch
  it's 1232 bytes).
 
 I was mulling over some other changes that would have saved another 256 bytes, 
 but those don't look like they would help.  Try the patch below.  It 
 essentially gives up and lets the stack occupy half of the lower page.

Well, that sweeps the problem under the carpet enough to make progress...
 
 Also, could you look at the stack pointer at each frame, to see if you are 
 encountering any stack hogs in the generic kernel?  In a different situation, 
 I found devfs putting a 3K structure on the stack.

OK, I'll look into it.

J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
 Also, could you look at the stack pointer at each frame, to see if you are 
 encountering any stack hogs in the generic kernel?  In a different situation, 
 I found devfs putting a 3K structure on the stack.

OK, top candidates on that stack trace are:

__restore():764
do_execve:  340
load_elf_binary:324
segv:   180
sigio_handler:  176
load_script:172
ext2_get_block: 160
set_signals:156
block_read_full_page:   124

Looks like do_execve should be pretty easy to shrink: most of the stack
is in a local of type struct linux_binprm (308 bytes), which could be
kmalloced.  I guess this would have some cost in speed, so I don't suppose
this could be a generic patch.  Anyway, it isn't a solution in itself.

load_elf_binary is harder to deal with, since it just has lots of locals,
each relatively small.

segv is mostly a local struct siginfo (128 bytes).

sigio_handler is mostly an fd_set (128 bytes).

load_script has a local buffer for remembering the interpreter (128 bytes).

All up, there's about 660 bytes of stack which can be relatively easily
saved by converting locals to kmalloced memory, which still isn't enough
to solve the problem.

I haven't looked into UML's interrupt handling, but perhaps another approach
is to try and avoid recursive interrupts/exceptions and do some kind of
tail-recursion optimisation in the exception/signal handler.  I don't
know if this would cause problems (deadlocks?).

Alternatively, could you just use a bigger stack?

J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeff Dike

[EMAIL PROTECTED] said:
 __restore():  764 
 do_execve:340 
 load_elf_binary:  324
 segv: 180
 sigio_handler:176
 load_script:  172
 ext2_get_block:   160
 set_signals:  156
 block_read_full_page: 124 

There's nothing really fixable here.

 All up, there's about 660 bytes of stack which can be relatively
 easily saved by converting locals to kmalloced memory, which still
 isn't enough to solve the problem. 

Yeah.  I did get rid of the fd_set in sigio_handler, saving another 128 bytes.

Basically, UML is being screwed by the signal frame size.

 I haven't looked into UML's interrupt handling, but perhaps another
 approach is to try and avoid recursive interrupts/exceptions and do
 some kind of tail-recursion optimisation in the exception/signal
 handler.  I don't know if this would cause problems (deadlocks?).

Probably :-)

I don't really want to fiddle with that, because that will impose constraints 
that will hurt responsiveness.

 Alternatively, could you just use a bigger stack? 

*ding ding ding*  We have a winner.

I'm going to allocate 4 pages instead of two, and either have three of them 
available for the stack or have two available for the stack and the third 
unmapped to protect the task structure.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-07 Thread Jeff Dike

Thank you!

I've been waiting for someone to send me that stack.  There aren't any real 
smoking guns there.  I'm guessing that the difference between your laptop and 
the machine it works on is that your laptop is running a fairly recent kernel 
(2.4.0-testx) and the other isn't.  The sigcontext struct greatly increased in 
size (to ~800 bytes IIRC) to accomodate the MMX registers or something.  There 
are three signals on your stack, so those frames by themselves are taking up 
half the stack page.

Anyway, the patch below removes 256 bytes from the set_signals frame.  It 
ought to alleviate things a bit.  I'll be looking for other things I can do, 
as well. Let me know how it works for you.

Jeff

--- arch/um/kernel/signal_user.c~   Thu Sep 14 17:00:08 2000
+++ arch/um/kernel/signal_user.cSun Oct  8 00:21:29 2000
@@ -45,26 +45,29 @@
 
 int set_signals(int enable)
 {
-   sigset_t mask, unmask, old;
+   sigset_t mask;
+   int ret;
 
check_stack_overflow();
+   sigprocmask(SIG_BLOCK, NULL, );
+   ret = enable_mask();
sigemptyset();
-   sigemptyset();
-   if(enable & (1 << SIGIO_BIT)) sigaddset(, SIGIO);
-   else sigaddset(, SIGIO);
+   if(enable & (1 << SIGIO_BIT)) sigaddset(, SIGIO);
if(enable & (1 << SIGVTALRM_BIT)){
-   sigaddset(, SIGVTALRM);
-   sigaddset(, SIGALRM);
+   sigaddset(, SIGVTALRM);
+   sigaddset(, SIGALRM);
}
-   else {
+   if(sigprocmask(SIG_UNBLOCK, , NULL) < 0)
+   panic("Failed to enable signals");
+   sigemptyset();
+   if((enable & (1 << SIGIO_BIT)) == 0) sigaddset(, SIGIO);
+   if((enable & (1 << SIGVTALRM_BIT)) == 0){
sigaddset(, SIGVTALRM);
sigaddset(, SIGALRM);
}
-   if(sigprocmask(SIG_BLOCK, , ) < 0)
-   panic("Failed to change signal mask");
-   if(sigprocmask(SIG_UNBLOCK, , NULL) < 0)
-   panic("Failed to change signal mask");
-   return(enable_mask());
+   if(sigprocmask(SIG_BLOCK, , NULL) < 0)
+   panic("Failed to block signals");
+   return(ret);
 }
 
 /*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



User-mode linux stack overflow: could be generic problem

2000-10-07 Thread Jeremy Fitzhardinge

Hi,

I've been playing with user-mode linux (2.4.0-pre9).  It works well on
one machine, but on my laptop I'm consistently getting stack overflows
just as init is started.

The backtrace (from a breakpoint at panic()):

(gdb) bt
#0  panic (fmt=0x10112e00 "Stack overflowed onto current_task page")
at panic.c:54
#1  0x100a244d in check_stack_overflow (ptr=0x5015ccc8) at process_kern.c:715
#2  0x1009ddc9 in set_signals (enable=0) at signal_user.c:50
#3  0x100050b0 in __wake_up (q=0x5014afc8, mode=35) at sched.c:714
#4  0x10020ea5 in end_buffer_io_sync (bh=0x5014af80, uptodate=1)
at /home/jeremy/uml/2.3/include/linux/locks.h:34
#5  0x100607a4 in end_that_request_first (req=0x500e8f00, uptodate=1, 
name=0x1011390d "User-mode block device") at ll_rw_blk.c:1000
#6  0x100a3d48 in ubd_finish () at /home/jeremy/uml/2.3/include/linux/blk.h:396
#7  0x100a3dd5 in ubd_handler () at ubd.c:222
#8  0x100a3e00 in ubd_intr (irq=3, dev=0x1012c0a0, unused=0x5015cd88)
at ubd.c:229
#9  0x1009c6bf in handle_IRQ_event (irq=3, regs=0x5015cd88, action=0x500573c0)
at irq.c:148
#10 0x1009c85f in do_IRQ (irq=3, user_mode=0) at irq.c:313
#11 0x1009cf8d in sigio_handler (sig=29) at irq_user.c:53
#12 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#13 0x1009de50 in set_signals (enable=3) at signal_user.c:65
#14 0x1005fb46 in generic_unplug_device (data=0x10160650) at ll_rw_blk.c:364
#15 0x100204ab in __wait_on_buffer (bh=0x5014af80)
at /home/jeremy/uml/2.3/include/linux/tqueue.h:120
#16 0x100212be in bread (dev=25088, block=92508, size=1024)
at /home/jeremy/uml/2.3/include/linux/locks.h:20
#17 0x10041a40 in ext2_get_block (inode=0x5013c0a0, iblock=288, 
bh_result=0x5014ae00, create=0) at inode.c:250
#18 0x10021edd in block_read_full_page (page=0x50008b74, 
get_block=0x10041978 ) at buffer.c:1613
#19 0x10042014 in ext2_readpage (file=0x500de1e0, page=0x50008b74)
at inode.c:659
#20 0x10013eb1 in read_cluster_nonblocking (file=0x500de1e0, offset=77, 
filesize=78) at filemap.c:440
#21 0x1001525c in filemap_nopage (area=0x500d2c60, address=134832128, 
no_share=2) at filemap.c:1391
#22 0x1001209d in do_no_page (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2, page_table=0x50159258) at memory.c:1150
#23 0x100121d4 in handle_mm_fault (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2) at memory.c:1207
#24 0x100a01b3 in segv (address=134832392, ip=268665530, is_write=2, is_user=0)
at trap_kern.c:89
#25 0x100a0902 in segv_handler (sig=11) at trap_user.c:258
#26 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#27 0x100397d4 in load_elf_binary (bprm=0x5015db24, regs=0x0)
at binfmt_elf.c:714
#28 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#29 0x10038226 in load_script (bprm=0x5015db24, regs=0x0) at binfmt_script.c:92
#30 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#31 0x100289cd in do_execve (filename=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, envp=0x804f2c0, regs=0x0) at exec.c:902
#32 0x1009c3fc in execve1 (file=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, env=0x804f2c0) at exec_kern.c:77
#33 0x1009c474 in sys_execve (file=0xbf7ffa88 "", argv=0xbf7ffb14, 
env=0x804f2c0) at exec_kern.c:101
#34 0x1009eab7 in execute_syscall (syscall=11, args=0x5015dcf8)
at syscall_kern.c:340
#35 0x1009eeb8 in syscall_handler (unused=0) at syscall_user.c:113
#36 0x1009bf03 in fork_handler (sig=10) at process.c:96
#37 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127

This is a pretty deep stack, but there's nothing unexpected there.
I would guess that some kind of very fast disk drive would also cause
this kind of deep stack on real hardware, if it can complete the
I/O and interrupt before the reschedule.

I tried adding some inlines to make the stack use a little shallower, but
it didn't help.

Any suggestions on how to get this working? 

Thanks,
J

 PGP signature


User-mode linux stack overflow: could be generic problem

2000-10-07 Thread Jeremy Fitzhardinge

Hi,

I've been playing with user-mode linux (2.4.0-pre9).  It works well on
one machine, but on my laptop I'm consistently getting stack overflows
just as init is started.

The backtrace (from a breakpoint at panic()):

(gdb) bt
#0  panic (fmt=0x10112e00 "Stack overflowed onto current_task page")
at panic.c:54
#1  0x100a244d in check_stack_overflow (ptr=0x5015ccc8) at process_kern.c:715
#2  0x1009ddc9 in set_signals (enable=0) at signal_user.c:50
#3  0x100050b0 in __wake_up (q=0x5014afc8, mode=35) at sched.c:714
#4  0x10020ea5 in end_buffer_io_sync (bh=0x5014af80, uptodate=1)
at /home/jeremy/uml/2.3/include/linux/locks.h:34
#5  0x100607a4 in end_that_request_first (req=0x500e8f00, uptodate=1, 
name=0x1011390d "User-mode block device") at ll_rw_blk.c:1000
#6  0x100a3d48 in ubd_finish () at /home/jeremy/uml/2.3/include/linux/blk.h:396
#7  0x100a3dd5 in ubd_handler () at ubd.c:222
#8  0x100a3e00 in ubd_intr (irq=3, dev=0x1012c0a0, unused=0x5015cd88)
at ubd.c:229
#9  0x1009c6bf in handle_IRQ_event (irq=3, regs=0x5015cd88, action=0x500573c0)
at irq.c:148
#10 0x1009c85f in do_IRQ (irq=3, user_mode=0) at irq.c:313
#11 0x1009cf8d in sigio_handler (sig=29) at irq_user.c:53
#12 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#13 0x1009de50 in set_signals (enable=3) at signal_user.c:65
#14 0x1005fb46 in generic_unplug_device (data=0x10160650) at ll_rw_blk.c:364
#15 0x100204ab in __wait_on_buffer (bh=0x5014af80)
at /home/jeremy/uml/2.3/include/linux/tqueue.h:120
#16 0x100212be in bread (dev=25088, block=92508, size=1024)
at /home/jeremy/uml/2.3/include/linux/locks.h:20
#17 0x10041a40 in ext2_get_block (inode=0x5013c0a0, iblock=288, 
bh_result=0x5014ae00, create=0) at inode.c:250
#18 0x10021edd in block_read_full_page (page=0x50008b74, 
get_block=0x10041978 ext2_get_block) at buffer.c:1613
#19 0x10042014 in ext2_readpage (file=0x500de1e0, page=0x50008b74)
at inode.c:659
#20 0x10013eb1 in read_cluster_nonblocking (file=0x500de1e0, offset=77, 
filesize=78) at filemap.c:440
#21 0x1001525c in filemap_nopage (area=0x500d2c60, address=134832128, 
no_share=2) at filemap.c:1391
#22 0x1001209d in do_no_page (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2, page_table=0x50159258) at memory.c:1150
#23 0x100121d4 in handle_mm_fault (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2) at memory.c:1207
#24 0x100a01b3 in segv (address=134832392, ip=268665530, is_write=2, is_user=0)
at trap_kern.c:89
#25 0x100a0902 in segv_handler (sig=11) at trap_user.c:258
#26 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#27 0x100397d4 in load_elf_binary (bprm=0x5015db24, regs=0x0)
at binfmt_elf.c:714
#28 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#29 0x10038226 in load_script (bprm=0x5015db24, regs=0x0) at binfmt_script.c:92
#30 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#31 0x100289cd in do_execve (filename=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, envp=0x804f2c0, regs=0x0) at exec.c:902
#32 0x1009c3fc in execve1 (file=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, env=0x804f2c0) at exec_kern.c:77
#33 0x1009c474 in sys_execve (file=0xbf7ffa88 "", argv=0xbf7ffb14, 
env=0x804f2c0) at exec_kern.c:101
#34 0x1009eab7 in execute_syscall (syscall=11, args=0x5015dcf8)
at syscall_kern.c:340
#35 0x1009eeb8 in syscall_handler (unused=0) at syscall_user.c:113
#36 0x1009bf03 in fork_handler (sig=10) at process.c:96
#37 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127

This is a pretty deep stack, but there's nothing unexpected there.
I would guess that some kind of very fast disk drive would also cause
this kind of deep stack on real hardware, if it can complete the
I/O and interrupt before the reschedule.

I tried adding some inlines to make the stack use a little shallower, but
it didn't help.

Any suggestions on how to get this working? 

Thanks,
J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-07 Thread Jeff Dike

Thank you!

I've been waiting for someone to send me that stack.  There aren't any real 
smoking guns there.  I'm guessing that the difference between your laptop and 
the machine it works on is that your laptop is running a fairly recent kernel 
(2.4.0-testx) and the other isn't.  The sigcontext struct greatly increased in 
size (to ~800 bytes IIRC) to accomodate the MMX registers or something.  There 
are three signals on your stack, so those frames by themselves are taking up 
half the stack page.

Anyway, the patch below removes 256 bytes from the set_signals frame.  It 
ought to alleviate things a bit.  I'll be looking for other things I can do, 
as well. Let me know how it works for you.

Jeff

--- arch/um/kernel/signal_user.c~   Thu Sep 14 17:00:08 2000
+++ arch/um/kernel/signal_user.cSun Oct  8 00:21:29 2000
@@ -45,26 +45,29 @@
 
 int set_signals(int enable)
 {
-   sigset_t mask, unmask, old;
+   sigset_t mask;
+   int ret;
 
check_stack_overflow(enable);
+   sigprocmask(SIG_BLOCK, NULL, mask);
+   ret = enable_mask(mask);
sigemptyset(mask);
-   sigemptyset(unmask);
-   if(enable  (1  SIGIO_BIT)) sigaddset(unmask, SIGIO);
-   else sigaddset(mask, SIGIO);
+   if(enable  (1  SIGIO_BIT)) sigaddset(mask, SIGIO);
if(enable  (1  SIGVTALRM_BIT)){
-   sigaddset(unmask, SIGVTALRM);
-   sigaddset(unmask, SIGALRM);
+   sigaddset(mask, SIGVTALRM);
+   sigaddset(mask, SIGALRM);
}
-   else {
+   if(sigprocmask(SIG_UNBLOCK, mask, NULL)  0)
+   panic("Failed to enable signals");
+   sigemptyset(mask);
+   if((enable  (1  SIGIO_BIT)) == 0) sigaddset(mask, SIGIO);
+   if((enable  (1  SIGVTALRM_BIT)) == 0){
sigaddset(mask, SIGVTALRM);
sigaddset(mask, SIGALRM);
}
-   if(sigprocmask(SIG_BLOCK, mask, old)  0)
-   panic("Failed to change signal mask");
-   if(sigprocmask(SIG_UNBLOCK, unmask, NULL)  0)
-   panic("Failed to change signal mask");
-   return(enable_mask(old));
+   if(sigprocmask(SIG_BLOCK, mask, NULL)  0)
+   panic("Failed to block signals");
+   return(ret);
 }
 
 /*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/