Re: KSE signal problems still
On Tue, 2 Jul 2002, Julian Elischer wrote: On Tue, 2 Jul 2002, Andrew Gallatin wrote: An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: A possibly related breakage is: type ^Z while doing make buiildworld (or something similar). when you type 'fg' there is a high change the build will abort.. # gdb -k /var/crash/kernel.1 /var/crash/vmcore.1 GNU gdb 5.2.0 (FreeBSD) 20020627 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-undermydesk-freebsd... ^C panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db where No such command db tr Debugger(c02dbf5a) at Debugger+0x46 panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf write(c409e780,d7d5ed14,3,b,282) at write+0x39 syscall(2f,2f,2f,1,8073410) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = 0xbfbff37c, ebp = 0xbfbff3e8 --- hum so, the question is: where should we get the sched lock? Maybe just remove the foot-shooting that releases it? % Index: kern_sig.c % === % RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v % retrieving revision 1.170 % retrieving revision 1.171 % diff -u -1 -r1.170 -r1.171 % --- kern_sig.c29 Jun 2002 02:00:01 - 1.170 % +++ kern_sig.c29 Jun 2002 17:26:18 - 1.171 % @@ -1486,15 +1540,9 @@ %*/ % - if (p-p_stat == SRUN) { % + mtx_unlock_spin(sched_lock); ^ shoot foot % + if (td-td_state == TDS_RUNQ || % + td-td_state == TDS_RUNNING) { I think sched_lock is needed for checking td_state too (strictly to use the result of the check, so the lock is not critical if the use doesn't do anything harmful), but there is no lock indication for td_state in proc.h like there used to be for p_stat. % + signotify(td-td_proc); Holding sched_lock when calling signotify() used to be an error, but that was changed in rev.1.155. This signotify() call seems to be bogus anyway. signotify() should only be called after the signal mask is changed. The call to signotify() here was removed in rev.1.154 when the semantics of signotify() was changed a little. Bogus calls to signotify() just waste time. % #ifdef SMP % - struct kse *ke; % - struct thread *td = curthread; % -/* we should only deliver to one thread.. but which one? */ % - FOREACH_KSEGRP_IN_PROC(p, kg) { % - FOREACH_KSE_IN_GROUP(kg, ke) { % - if (ke-ke_thread == td) { % - continue; % - } % - forward_signal(ke-ke_thread); % - } % - } % + if (td-td_state == TDS_RUNNING td != curthread) % + forward_signal(td); % #endif forward_signal() was called with sched_lock held in rev.1.170, and forward_signal() still requires it to be held. I think sched_lock is needed for checking td_state too, as above. Here it is fairly clear that calling forward_signal() bogusly after losing a race is harmless. It just wakes up td to look for a signal that isn't there or can't be handled yet. Since this only happens if we lose a race, it may be more efficient to let it happen (rarely) than to lock (always) to prevent it happening. But we already held the lock so the locking was free except for latency issues. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Wed, 3 Jul 2002, Bruce Evans wrote: Maybe just remove the foot-shooting that releases it? % Index: kern_sig.c % === % RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v % retrieving revision 1.170 % retrieving revision 1.171 % diff -u -1 -r1.170 -r1.171 % --- kern_sig.c 29 Jun 2002 02:00:01 - 1.170 % +++ kern_sig.c 29 Jun 2002 17:26:18 - 1.171 % @@ -1486,15 +1540,9 @@ % */ % - if (p-p_stat == SRUN) { % + mtx_unlock_spin(sched_lock); ^ shoot foot % + if (td-td_state == TDS_RUNQ || % + td-td_state == TDS_RUNNING) { I think sched_lock is needed for checking td_state too (strictly to use the result of the check, so the lock is not critical if the use doesn't do anything harmful), but there is no lock indication for td_state in proc.h like there used to be for p_stat. % + signotify(td-td_proc); Holding sched_lock when calling signotify() used to be an error, but that was changed in rev.1.155. This signotify() call seems to be bogus anyway. signotify() should only be called after the signal mask is changed. The call to signotify() here was removed in rev.1.154 when the semantics of signotify() was changed a little. Bogus calls to signotify() just waste time. % #ifdef SMP % - struct kse *ke; % - struct thread *td = curthread; % -/* we should only deliver to one thread.. but which one? */ % - FOREACH_KSEGRP_IN_PROC(p, kg) { % - FOREACH_KSE_IN_GROUP(kg, ke) { % - if (ke-ke_thread == td) { % - continue; % - } % - forward_signal(ke-ke_thread); % - } % - } % + if (td-td_state == TDS_RUNNING td != curthread) % + forward_signal(td); % #endif forward_signal() was called with sched_lock held in rev.1.170, and forward_signal() still requires it to be held. I think sched_lock is needed for checking td_state too, as above. Here it is fairly clear that calling forward_signal() bogusly after losing a race is harmless. It just wakes up td to look for a signal that isn't there or can't be handled yet. Since this only happens if we lose a race, it may be more efficient to let it happen (rarely) than to lock (always) to prevent it happening. But we already held the lock so the locking was free except for latency issues. Bruce Untested fix for thes bugs and some style bugs in tdsignal(): Index: kern_sig.c === RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.171 diff -u -2 -r1.171 kern_sig.c --- kern_sig.c 29 Jun 2002 17:26:18 - 1.171 +++ kern_sig.c 3 Jul 2002 07:42:31 - @@ -1468,5 +1449,5 @@ /* * The force of a signal has been directed against a single - * thread. We need to see what we can do about knocking it + * thread. We need to see what we can do about knocking it * out of any sleep it may be in etc. */ @@ -1485,8 +1466,7 @@ */ mtx_lock_spin(sched_lock); - if ((action == SIG_DFL) (prop SA_KILL)) { - if (td-td_priority PUSER) { + if (action == SIG_DFL (prop SA_KILL)) { + if (td-td_priority PUSER) td-td_priority = PUSER; - } } mtx_unlock_spin(sched_lock); @@ -1496,7 +1476,7 @@ * except that stopped processes must be continued by SIGCONT. */ - if (action == SIG_HOLD) { + if (action == SIG_HOLD) goto out; - } + mtx_lock_spin(sched_lock); if (td-td_state == TDS_SLP) { @@ -1531,24 +1511,17 @@ } goto runfast; - /* NOTREACHED */ - } else { /* -* Other states do nothing with the signal immediatly, +* Other states do nothing with the signal immediately, * other than kicking ourselves if we are running. * It will either never be noticed, or noticed very soon. */ - mtx_unlock_spin(sched_lock); - if (td-td_state == TDS_RUNQ || - td-td_state == TDS_RUNNING) { - signotify(td-td_proc); #ifdef SMP - if (td-td_state == TDS_RUNNING td != curthread) - forward_signal(td); + if (td-td_state == TDS_RUNNING td != curthread) + forward_signal(td); #endif - } + mtx_unlock_spin(sched_lock); goto out; } - /*NOTREACHED*/
Re: KSE signal problems still
On Wed, 3 Jul 2002, Bruce Evans wrote: On Tue, 2 Jul 2002, Julian Elischer wrote: Maybe just remove the foot-shooting that releases it? Yes I'm rationalising it at the moment.. turns out that just holding it for all of tdsignal works well. Also removing it from setrunnable() is ok as all the callers I could find have already locked it. I checked in a stopgap to stop panics but I'm reworking it now. the trouble is that thread semantics are really not well defined for multi thread processes. What does it mean to make a process run when it has many threads? Should ALL threads be awakened, or is it enough if ONE thread awakens to deliver the thread. For right now it's mostly important that single threaded processs act as they used to. We can always change how multithreaded processes work. % Index: kern_sig.c % === % RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v % retrieving revision 1.170 % retrieving revision 1.171 % diff -u -1 -r1.170 -r1.171 % --- kern_sig.c 29 Jun 2002 02:00:01 - 1.170 % +++ kern_sig.c 29 Jun 2002 17:26:18 - 1.171 % @@ -1486,15 +1540,9 @@ % */ % - if (p-p_stat == SRUN) { % + mtx_unlock_spin(sched_lock); ^ shoot foot % + if (td-td_state == TDS_RUNQ || % + td-td_state == TDS_RUNNING) { I think sched_lock is needed for checking td_state too (strictly to use the result of the check, so the lock is not critical if the use doesn't do anything harmful), but there is no lock indication for td_state in proc.h like there used to be for p_stat. % + signotify(td-td_proc); Holding sched_lock when calling signotify() used to be an error, but that was changed in rev.1.155. This signotify() call seems to be bogus anyway. signotify() should only be called after the signal mask is changed. The call to signotify() here was removed in rev.1.154 when the semantics of signotify() was changed a little. Bogus calls to signotify() just waste time. Signotify is already calledin psignal so I've removed this one from my version. % #ifdef SMP % - struct kse *ke; % - struct thread *td = curthread; % -/* we should only deliver to one thread.. but which one? */ % - FOREACH_KSEGRP_IN_PROC(p, kg) { % - FOREACH_KSE_IN_GROUP(kg, ke) { % - if (ke-ke_thread == td) { % - continue; % - } % - forward_signal(ke-ke_thread); % - } % - } % + if (td-td_state == TDS_RUNNING td != curthread) % + forward_signal(td); % #endif forward_signal() was called with sched_lock held in rev.1.170, and forward_signal() still requires it to be held. I think sched_lock is needed for checking td_state too, as above. Here it is fairly clear that calling forward_signal() bogusly after losing a race is harmless. It just wakes up td to look for a signal that isn't there or can't be handled yet. Since this only happens if we lose a race, it may be more efficient to let it happen (rarely) than to lock (always) to prevent it happening. But we already held the lock so the locking was free except for latency issues. much of what you say will be in my next commit I told Andrew Gallatin that I would work on cleaning up tdsignal and maybe psignal tonight, so that's what I've been doing.. it's not perfect tough.. but it clears it up a bit.. I'm just testing it at the moment. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On 03-Jul-2002 Julian Elischer wrote: On Wed, 3 Jul 2002, John Baldwin wrote: Erm, I thought I changd signotify() to require sched_lock and made the second half of psignal() (the whole case statement) lock sched_lock. Did you change that? (To Julian) psignal as a whole hasn't existed in the KSE tree since December. I must have missed it in the complicated merge that came from that in P4. I just checked it in like this for now to stop the panics until I can work out what he equivalent change to your is.. (feel free to check out the new psignal/tdsignal combination.) Well then it must be full of races then that were fixed since DP1. *sigh* I wonder how many other things were lost and need to be reimplemented. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Julian Elischer wrote: Should ALL threads be awakened, or is it enough if ONE thread awakens to deliver the thread. For right now it's mostly important that single threaded processs act as they used to. We can always change how multithreaded processes work. POSIX makes no guarantees for threads delivery of signals. Specifically, signals are not thread-things, they are process things, and there are seperate threads-things for sending the moral equivalents (e.g. pthread_kill) to threads on an individual basis, but the system is not expected to make a distinction on signal delivery as to what theread is running, nor are there expected to be per thread masking, etc.. Garrett would probably be the right person to ask; he's a much better POSIX lawyer. This is really the problem I tried to explain earlier when it came to the disabling on SIG_POLL on a per descriptor basis. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Wed, 3 Jul 2002, John Baldwin wrote: On 03-Jul-2002 Julian Elischer wrote: On Wed, 3 Jul 2002, John Baldwin wrote: Erm, I thought I changd signotify() to require sched_lock and made the second half of psignal() (the whole case statement) lock sched_lock. Did you change that? (To Julian) psignal as a whole hasn't existed in the KSE tree since December. I must have missed it in the complicated merge that came from that in P4. I just checked it in like this for now to stop the panics until I can work out what he equivalent change to your is.. (feel free to check out the new psignal/tdsignal combination.) Well then it must be full of races then that were fixed since DP1. *sigh* I wonder how many other things were lost and need to be reimplemented. Psignal is asside from kern_switch.c probably the largest single casualty. I'm just checking in a cleanup now.. wait a few minutes. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Expanding on my own mail: On Wed, 3 Jul 2002, Julian Elischer wrote: On Wed, 3 Jul 2002, John Baldwin wrote: Well then it must be full of races then that were fixed since DP1. *sigh* I wonder how many other things were lost and need to be reimplemented. Almost anything you checked into psignal will need looking at. It may not be mising but since signals for threaded processes are fundamentally different than signals for non threaded processes, some things just don't apply. for example if you checked in something to code that just doesn;t exist any more in a KSE kernel, what is the correct integration? Each one has to be evaluated on it's own.. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On 03-Jul-2002 Julian Elischer wrote: Expanding on my own mail: On Wed, 3 Jul 2002, Julian Elischer wrote: On Wed, 3 Jul 2002, John Baldwin wrote: Well then it must be full of races then that were fixed since DP1. *sigh* I wonder how many other things were lost and need to be reimplemented. Almost anything you checked into psignal will need looking at. It may not be mising but since signals for threaded processes are fundamentally different than signals for non threaded processes, some things just don't apply. for example if you checked in something to code that just doesn;t exist any more in a KSE kernel, what is the correct integration? Each one has to be evaluated on it's own.. The one in question here was fairly simple, it just expanded the sched_lock locking some. The argument could be made that you shouldn't be checking in stuff until you know how it works, etc., or that you could commit in smaller pieces (say, get multiple threads per process for kernel processes working in the scheduler and just ignoring userland-only things like signals until you have the other working). -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Wed, 3 Jul 2002, John Baldwin wrote: The argument could be made that you shouldn't be checking in stuff until you know how it works, etc., or that you could commit in smaller pieces (say, get multiple threads per process for kernel processes working in the scheduler and just ignoring userland-only things like signals until you have the other working). You can't do those separatly unfortulatly.. anyhow, it's not that I don't understand it, it's just that it's complicated.. The new version is as clse as I can get quickly but it still needs some cleaning. -- John Baldwin [EMAIL PROTECTED] http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On 03-Jul-2002 Julian Elischer wrote: On Wed, 3 Jul 2002, John Baldwin wrote: The argument could be made that you shouldn't be checking in stuff until you know how it works, etc., or that you could commit in smaller pieces (say, get multiple threads per process for kernel processes working in the scheduler and just ignoring userland-only things like signals until you have the other working). You can't do those separatly unfortulatly.. Sure you could, just have kernel-only KSE processes at first and use some special kernel processes for testing. They would never return to userland but would be adequate to test that all the various run and sleep queues, etc. worked fine. anyhow, it's not that I don't understand it, it's just that it's complicated.. That part of my message was overly harsh. I'm sorry. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
KSE signal problems still
An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: # gdb -k /var/crash/kernel.1 /var/crash/vmcore.1 GNU gdb 5.2.0 (FreeBSD) 20020627 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-undermydesk-freebsd... ^C panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db where No such command db tr Debugger(c02dbf5a) at Debugger+0x46 panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf write(c409e780,d7d5ed14,3,b,282) at write+0x39 syscall(2f,2f,2f,1,8073410) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = 0xbfbff37c, ebp = 0xbfbff3e8 --- This is a kernel with an updated version of kern_sync and kern_condvar: lcvs status kern/kern_synch.c kern/kern_condvar.c === File: kern_synch.c Status: Up-to-date Working revision:1.179 Tue Jul 2 20:18:15 2002 Repository revision: 1.179 /home/ncvs/src/sys/kern/kern_synch.c,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none) === File: kern_condvar.cStatus: Up-to-date Working revision:1.24Tue Jul 2 20:18:14 2002 Repository revision: 1.24/home/ncvs/src/sys/kern/kern_condvar.c,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none) I apologize if I'm being redundant, but the FreeBSD mail server seems to be stuck -- I haven't gotten any messages on committers or -current in hours. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Tue, 2 Jul 2002, Andrew Gallatin wrote: An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: A possibly related breakage is: type ^Z while doing make buiildworld (or something similar). when you type 'fg' there is a high change the build will abort.. # gdb -k /var/crash/kernel.1 /var/crash/vmcore.1 GNU gdb 5.2.0 (FreeBSD) 20020627 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-undermydesk-freebsd... ^C panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db where No such command db tr Debugger(c02dbf5a) at Debugger+0x46 panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf write(c409e780,d7d5ed14,3,b,282) at write+0x39 syscall(2f,2f,2f,1,8073410) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = 0xbfbff37c, ebp = 0xbfbff3e8 --- hum so, the question is: where should we get the sched lock? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Julian Elischer writes: On Tue, 2 Jul 2002, Andrew Gallatin wrote: An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: A possibly related breakage is: type ^Z while doing make buiildworld (or something similar). when you type 'fg' there is a high change the build will abort.. This is nearly 100% for me. But only on MP boxes. On my uniprocessor alpha, things work just fine. Oh.. hmm.. I'm not sure if I have witless compiled in there.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Tue, 2 Jul 2002, Andrew Gallatin wrote: Julian Elischer writes: On Tue, 2 Jul 2002, Andrew Gallatin wrote: An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: A possibly related breakage is: type ^Z while doing make buiildworld (or something similar). when you type 'fg' there is a high change the build will abort.. This is nearly 100% for me. But only on MP boxes. On my uniprocessor alpha, things work just fine. Oh.. hmm.. I'm not sure if I have witless compiled in there.. which is almost 100%,? the ^Z killing the process, or ^C killing the machine? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
:... : : : This is nearly 100% for me. But only on MP boxes. On my uniprocessor : alpha, things work just fine. Oh.. hmm.. I'm not sure if I have : witless compiled in there.. : : which is almost 100%,? the ^Z killing the process, or ^C killing the : machine? : :^C killing the machine. : :Drew How are we doing on IA32? I've successfully run 9 buildworld -j 5's so far with a SMP build of -current. I'm going to run a bunch more and then I'll switch to testing signals (a buildworld only generates 4 or 5 signals over the entire build so it isn't a good test for signal-related issues). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
try this: in tdsignal, (kern_sig.c) take a lock on schedlock and release it again, just around the call to forward-signal() forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 hopefully this will not be called with the schedlock already locked if we panic becasue we already own it, it gets more difficult.. On Tue, 2 Jul 2002, Andrew Gallatin wrote: Julian Elischer writes: On Tue, 2 Jul 2002, Andrew Gallatin wrote: Julian Elischer writes: On Tue, 2 Jul 2002, Andrew Gallatin wrote: An easy way to induce a panic w/a post KSE -current is to ^C gdb as it starts on an SMP machine: A possibly related breakage is: type ^Z while doing make buiildworld (or something similar). when you type 'fg' there is a high change the build will abort.. This is nearly 100% for me. But only on MP boxes. On my uniprocessor alpha, things work just fine. Oh.. hmm.. I'm not sure if I have witless compiled in there.. which is almost 100%,? the ^Z killing the process, or ^C killing the machine? ^C killing the machine. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Matthew Dillon writes: :... : : : This is nearly 100% for me. But only on MP boxes. On my uniprocessor : alpha, things work just fine. Oh.. hmm.. I'm not sure if I have : witless compiled in there.. : : which is almost 100%,? the ^Z killing the process, or ^C killing the : machine? : :^C killing the machine. : :Drew How are we doing on IA32? I've successfully run 9 buildworld -j 5's so far with a SMP build of -current. I'm going to run a bunch more and then I'll switch to testing signals (a buildworld only generates 4 or 5 signals over the entire build so it isn't a good test for signal-related issues). The above refers to IA32. My (UP, w/o witness) alpha seems solid. No panics so far. Its the SMP IA32 box that keeps falling on its face with a signal.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Julian Elischer writes: try this: in tdsignal, (kern_sig.c) take a lock on schedlock and release it again, just around the call to forward-signal() forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 hopefully this will not be called with the schedlock already locked Following your suggestion, the appended patch appears to work. However, it does seem a bit silly, as we end up dropping and-reaquiring the sched lock quite a few times: mtx_unlock_spin(sched_lock); if (td-td_state == TDS_RUNQ || td-td_state == TDS_RUNNING) { signotify(td-td_proc); /* grabs releases sched_lock*/ #ifdef SMP if (td-td_state == TDS_RUNNING td != curthread) { mtx_lock_spin(sched_lock); forward_signal(td); mtx_unlock_spin(sched_lock); } #endif } goto out; Wouldn't it be cleaner if there was a signotify_locked () that assumed you had the sched_lock held (and was called by signotify)? Drew Index: kern_sig.c === RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.171 diff -u -r1.171 kern_sig.c --- kern_sig.c 29 Jun 2002 17:26:18 - 1.171 +++ kern_sig.c 3 Jul 2002 01:48:35 - @@ -1543,8 +1543,11 @@ td-td_state == TDS_RUNNING) { signotify(td-td_proc); #ifdef SMP - if (td-td_state == TDS_RUNNING td != curthread) + if (td-td_state == TDS_RUNNING td != curthread) { + mtx_lock_spin(sched_lock); forward_signal(td); + mtx_unlock_spin(sched_lock); + } #endif } goto out; To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
AHH I assumed it was alpha... On Tue, 2 Jul 2002, Andrew Gallatin wrote: Matthew Dillon writes: :... : : : This is nearly 100% for me. But only on MP boxes. On my uniprocessor : alpha, things work just fine. Oh.. hmm.. I'm not sure if I have : witless compiled in there.. : : which is almost 100%,? the ^Z killing the process, or ^C killing the : machine? : :^C killing the machine. : :Drew How are we doing on IA32? I've successfully run 9 buildworld -j 5's so far with a SMP build of -current. I'm going to run a bunch more and then I'll switch to testing signals (a buildworld only generates 4 or 5 signals over the entire build so it isn't a good test for signal-related issues). The above refers to IA32. My (UP, w/o witness) alpha seems solid. No panics so far. Its the SMP IA32 box that keeps falling on its face with a signal.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
we seem pretty solid on ia32 ^Z and then fg will sometimes kill teh process instead of forgrounding it though. (I aborted several buildworlds that way accidentally) Andrew's panic seems SMP specific though.. you may check if there is somethign different between ia32 and alpha on whether it holds schedlock at this point: panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db where No such command db tr Debugger(c02dbf5a) at Debugger+0x46 panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf write(c409e780,d7d5ed14,3,b,282) at write+0x39 syscall(2f,2f,2f,1,8073410) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = 0xbfbff37c, ebp = 0xbfbff3e8 --- I'm trying to test jeff's latest patch but got side tracked by hardware .. On Tue, 2 Jul 2002, Matthew Dillon wrote: :... : : : This is nearly 100% for me. But only on MP boxes. On my uniprocessor : alpha, things work just fine. Oh.. hmm.. I'm not sure if I have : witless compiled in there.. : : which is almost 100%,? the ^Z killing the process, or ^C killing the : machine? : :^C killing the machine. : :Drew How are we doing on IA32? I've successfully run 9 buildworld -j 5's so far with a SMP build of -current. I'm going to run a bunch more and then I'll switch to testing signals (a buildworld only generates 4 or 5 signals over the entire build so it isn't a good test for signal-related issues). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Tue, 2 Jul 2002, Andrew Gallatin wrote: Julian Elischer writes: try this: in tdsignal, (kern_sig.c) take a lock on schedlock and release it again, just around the call to forward-signal() forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 hopefully this will not be called with the schedlock already locked Following your suggestion, the appended patch appears to work. However, it does seem a bit silly, as we end up dropping and-reaquiring the sched lock quite a few times: That's why I just asked you to test the concept.. If I know that just aquiring it here is ok, (I presume you tried doing some work like this) that tells me that this code isn't called from some odd place, with the sched lock already set. (that and code inspection of course..) Now we know it works we can try optimise it.. I'm going home now for dinner, so if you feel like checking this or something mor optimal in, be my guest :-) mtx_unlock_spin(sched_lock); if (td-td_state == TDS_RUNQ || td-td_state == TDS_RUNNING) { signotify(td-td_proc); /* grabs releases sched_lock*/ #ifdef SMP if (td-td_state == TDS_RUNNING td != curthread) { mtx_lock_spin(sched_lock); forward_signal(td); mtx_unlock_spin(sched_lock); } #endif } goto out; Wouldn't it be cleaner if there was a signotify_locked () that assumed you had the sched_lock held (and was called by signotify)? Drew Index: kern_sig.c === RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.171 diff -u -r1.171 kern_sig.c --- kern_sig.c29 Jun 2002 17:26:18 - 1.171 +++ kern_sig.c3 Jul 2002 01:48:35 - @@ -1543,8 +1543,11 @@ td-td_state == TDS_RUNNING) { signotify(td-td_proc); #ifdef SMP - if (td-td_state == TDS_RUNNING td != curthread) + if (td-td_state == TDS_RUNNING td != curthread) { + mtx_lock_spin(sched_lock); forward_signal(td); + mtx_unlock_spin(sched_lock); + } #endif } goto out; To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
ignore this Matt.. it was on ia32. On Tue, 2 Jul 2002, Julian Elischer wrote: we seem pretty solid on ia32 ^Z and then fg will sometimes kill teh process instead of forgrounding it though. (I aborted several buildworlds that way accidentally) Andrew's panic seems SMP specific though.. you may check if there is somethign different between ia32 and alpha on whether it holds schedlock at this point: panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db where No such command db tr Debugger(c02dbf5a) at Debugger+0x46 panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 forward_signal(c4445540) at forward_signal+0x1a tdsignal(c4445540,2,2) at tdsignal+0x182 psignal(c443d558,2) at psignal+0x3c8 pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf write(c409e780,d7d5ed14,3,b,282) at write+0x39 syscall(2f,2f,2f,1,8073410) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = 0xbfbff37c, ebp = 0xbfbff3e8 --- I'm trying to test jeff's latest patch but got side tracked by hardware .. On Tue, 2 Jul 2002, Matthew Dillon wrote: :... : : : This is nearly 100% for me. But only on MP boxes. On my uniprocessor : alpha, things work just fine. Oh.. hmm.. I'm not sure if I have : witless compiled in there.. : : which is almost 100%,? the ^Z killing the process, or ^C killing the : machine? : :^C killing the machine. : :Drew How are we doing on IA32? I've successfully run 9 buildworld -j 5's so far with a SMP build of -current. I'm going to run a bunch more and then I'll switch to testing signals (a buildworld only generates 4 or 5 signals over the entire build so it isn't a good test for signal-related issues). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Julian Elischer writes: However, it does seem a bit silly, as we end up dropping and-reaquiring the sched lock quite a few times: That's why I just asked you to test the concept.. If I know that just aquiring it here is ok, (I presume you tried doing some work like this) that tells me that this code isn't called from some odd place, with the sched lock already set. (that and code inspection of course..) Now we know it works we can try optimise it.. I'm going home now for dinner, so if you feel like checking this or something mor optimal in, be my guest :-) OK, I've checked in the unoptimized fix. Please do optimize it when you get a chance. Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On 03-Jul-2002 Andrew Gallatin wrote: Julian Elischer writes: However, it does seem a bit silly, as we end up dropping and-reaquiring the sched lock quite a few times: That's why I just asked you to test the concept.. If I know that just aquiring it here is ok, (I presume you tried doing some work like this) that tells me that this code isn't called from some odd place, with the sched lock already set. (that and code inspection of course..) Now we know it works we can try optimise it.. I'm going home now for dinner, so if you feel like checking this or something mor optimal in, be my guest :-) OK, I've checked in the unoptimized fix. Please do optimize it when you get a chance. Erm, I thought I changd signotify() to require sched_lock and made the second half of psignal() (the whole case statement) lock sched_lock. Did you change that? (To Julian) -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
On Wed, 3 Jul 2002, John Baldwin wrote: Erm, I thought I changd signotify() to require sched_lock and made the second half of psignal() (the whole case statement) lock sched_lock. Did you change that? (To Julian) psignal as a whole hasn't existed in the KSE tree since December. I must have missed it in the complicated merge that came from that in P4. I just checked it in like this for now to stop the panics until I can work out what he equivalent change to your is.. (feel free to check out the new psignal/tdsignal combination.) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
I can get a panic when ^C'ing buildworld on an SMP build of -current: -Matt test3# j test3# panic: mutex sched lock not owned at /FreeBSD/FreeBSD-current/src/sys/kern/subr_smp.c:126 cpuid = 1; lapic.id = Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db trace Debugger(c02ec4ba) at Debugger+0x46 panic(c02eb5e8,c02eb758,c02efe80,7e,c6a5a9c0) at panic+0xd6 _mtx_assert(c0325a20,1,c02efe80,7e) at _mtx_assert+0xa8 forward_signal(c6a5a9c0) at forward_signal+0x1a tdsignal(c6a5a9c0,2,0) at tdsignal+0x182 psignal(c665f804,2) at psignal+0x3c8 pgsignal(c6bbe480,2,1,c6bbe49c,0) at pgsignal+0x63 ttyinput(3,c6413230,c6413200,0,e0e71b03) at ttyinput+0x316 ptcwrite(c6648600,e0e71c88,7f0011,1,e0e71bc4) at ptcwrite+0x17f spec_write(e0e71bf0,e0e71c3c,c020f0a0,e0e71bf0,7f0011) at spec_write+0x5a spec_vnoperate(e0e71bf0) at spec_vnoperate+0x13 vn_write(c645aec4,e0e71c88,c6641380,0,c622b3c0) at vn_write+0x1c8 dofilewrite(c622b3c0,c645aec4,7,807f000,1) at dofilewrite+0xaf write(c622b3c0,e0e71d14,3,9,282) at write+0x39 syscall(2f,2f,2f,8074600,8074644) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fc3a3, esp = 0xbfbff36c, ebp = 0xbfbff3d8 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
Andrew Gallatin fixed the problem in kern_sig.c, check it out: gallatin2002/07/02 19:55:48 PDT Modified files: sys/kern kern_sig.c Log: Hold the sched lock across call to forward_signal() in tdsignal() to keep SMP systems from panic'ing when ^C'ing an app suggested by julian Revision ChangesPath 1.172 +4 -1 src/sys/kern/kern_sig.c - Original Message - From: Matthew Dillon [EMAIL PROTECTED] To: Julian Elischer [EMAIL PROTECTED] Cc: Andrew Gallatin [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 03, 2002 1:36 PM Subject: Re: KSE signal problems still I can get a panic when ^C'ing buildworld on an SMP build of -current: -Matt test3# j test3# panic: mutex sched lock not owned at /FreeBSD/FreeBSD-current/src/sys/kern/subr_smp.c:126 cpuid = 1; lapic.id = Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db trace Debugger(c02ec4ba) at Debugger+0x46 panic(c02eb5e8,c02eb758,c02efe80,7e,c6a5a9c0) at panic+0xd6 _mtx_assert(c0325a20,1,c02efe80,7e) at _mtx_assert+0xa8 forward_signal(c6a5a9c0) at forward_signal+0x1a tdsignal(c6a5a9c0,2,0) at tdsignal+0x182 psignal(c665f804,2) at psignal+0x3c8 pgsignal(c6bbe480,2,1,c6bbe49c,0) at pgsignal+0x63 ttyinput(3,c6413230,c6413200,0,e0e71b03) at ttyinput+0x316 ptcwrite(c6648600,e0e71c88,7f0011,1,e0e71bc4) at ptcwrite+0x17f spec_write(e0e71bf0,e0e71c3c,c020f0a0,e0e71bf0,7f0011) at spec_write+0x5a spec_vnoperate(e0e71bf0) at spec_vnoperate+0x13 vn_write(c645aec4,e0e71c88,c6641380,0,c622b3c0) at vn_write+0x1c8 dofilewrite(c622b3c0,c645aec4,7,807f000,1) at dofilewrite+0xaf write(c622b3c0,e0e71d14,3,9,282) at write+0x39 syscall(2f,2f,2f,8074600,8074644) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fc3a3, esp = 0xbfbff36c, ebp = 0xbfbff3d8 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
I just fixed that.. get a new version of kern_sig.c On Tue, 2 Jul 2002, Matthew Dillon wrote: I can get a panic when ^C'ing buildworld on an SMP build of -current: -Matt test3# j test3# panic: mutex sched lock not owned at /FreeBSD/FreeBSD-current/src/sys/kern/subr_smp.c:126 cpuid = 1; lapic.id = Debugger(panic) Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db trace Debugger(c02ec4ba) at Debugger+0x46 panic(c02eb5e8,c02eb758,c02efe80,7e,c6a5a9c0) at panic+0xd6 _mtx_assert(c0325a20,1,c02efe80,7e) at _mtx_assert+0xa8 forward_signal(c6a5a9c0) at forward_signal+0x1a tdsignal(c6a5a9c0,2,0) at tdsignal+0x182 psignal(c665f804,2) at psignal+0x3c8 pgsignal(c6bbe480,2,1,c6bbe49c,0) at pgsignal+0x63 ttyinput(3,c6413230,c6413200,0,e0e71b03) at ttyinput+0x316 ptcwrite(c6648600,e0e71c88,7f0011,1,e0e71bc4) at ptcwrite+0x17f spec_write(e0e71bf0,e0e71c3c,c020f0a0,e0e71bf0,7f0011) at spec_write+0x5a spec_vnoperate(e0e71bf0) at spec_vnoperate+0x13 vn_write(c645aec4,e0e71c88,c6641380,0,c622b3c0) at vn_write+0x1c8 dofilewrite(c622b3c0,c645aec4,7,807f000,1) at dofilewrite+0xaf write(c622b3c0,e0e71d14,3,9,282) at write+0x39 syscall(2f,2f,2f,8074600,8074644) at syscall+0x23c syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (4, FreeBSD ELF, write), eip = 0x281fc3a3, esp = 0xbfbff36c, ebp = 0xbfbff3d8 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: KSE signal problems still
: : :Andrew Gallatin fixed the problem in kern_sig.c, check it out: : :gallatin2002/07/02 19:55:48 PDT : Will do tomorrow! -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message