Re: -current is _definitely_ not stable right now
John Baldwin wrote: On 28-May-01 Doug Barton wrote: Gang, On the avi front, typing 'aviplay' with or without an argument is guaranteed to instantly wedge the box. I attached a lot of running aviplay through truss, but I have no way to know if it stopped at or before the offending instruction. As for the general wonkiness of the system, I have finally gotten a dump. The backtrace is below, let me know if there is anything else I can do to help debug. Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. This worked excellently for me! I patched the kernel and rebuilt, then tested aviplay... success. Then I cvsup'ed, built/installed world and kernel, and started stress testing. I'm currently running two builds of X 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile (ok, it's a little choppy, but still runs). I'd say it's probably safe to go back in the water again. Next stop, re-enabling softupdates. :) BTW, I'm probably wrong about this but looking at the patch it seems odd to me that one of these is ifndef and the other is ifdef: @@ -422,15 +433,21 @@ kmem_free(kernel_map, (vm_offset_t)old_ldt_base, old_ldt_len * sizeof(union descriptor)); FREE(new_ldt, M_SUBPROC); +#ifndef SMP + mtx_lock_spin(sched_lock); +#endif } else { pcb-pcb_ldt = pcb_ldt = new_ldt; +#ifdef SMP mtx_unlock_spin(sched_lock); +#endif } Just curious, Doug (Thanks BTW) -- I need someone really bad. Are you really bad? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
Doug Barton wrote: This worked excellently for me! I patched the kernel and rebuilt, then tested aviplay... success. Then I cvsup'ed, built/installed world and kernel, and started stress testing. I'm currently running two builds of X 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile (ok, it's a little choppy, but still runs). I forgot to mention explicitly, this is all running in X. :) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
On 30-May-01 Doug Barton wrote: John Baldwin wrote: On 28-May-01 Doug Barton wrote: Gang, On the avi front, typing 'aviplay' with or without an argument is guaranteed to instantly wedge the box. I attached a lot of running aviplay through truss, but I have no way to know if it stopped at or before the offending instruction. As for the general wonkiness of the system, I have finally gotten a dump. The backtrace is below, let me know if there is anything else I can do to help debug. Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. This worked excellently for me! I patched the kernel and rebuilt, then tested aviplay... success. Then I cvsup'ed, built/installed world and kernel, and started stress testing. I'm currently running two builds of X 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile (ok, it's a little choppy, but still runs). I'd say it's probably safe to go back in the water again. Next stop, re-enabling softupdates. :) Sounds good, I'll commit it in a second. BTW, I'm probably wrong about this but looking at the patch it seems odd to me that one of these is ifndef and the other is ifdef: @@ -422,15 +433,21 @@ kmem_free(kernel_map, (vm_offset_t)old_ldt_base, old_ldt_len * sizeof(union descriptor)); FREE(new_ldt, M_SUBPROC); +#ifndef SMP + mtx_lock_spin(sched_lock); +#endif } else { pcb-pcb_ldt = pcb_ldt = new_ldt; +#ifdef SMP mtx_unlock_spin(sched_lock); +#endif } Just curious, We won't to continue to hold the sched_lock before calling set_user_ldt in the !SMP case, but we want to release it in the SMP case before calling the smp rendezvous. We have to release the sched_lock in the then clause before calling kmem_free, so each clause finishes with the sched_lock in a different state. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
On 28-May-01 Doug Barton wrote: I forgot something: IdlePTD 4734976 initial pcb at 3b5f80 panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 panic messages: I would need a traceback from here. It looks like someone called msleep or tsleep with sched lock held. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
On Tue, 2001/05/29 at 09:39:42 -0700, John Baldwin wrote: On 28-May-01 Doug Barton wrote: I forgot something: IdlePTD 4734976 initial pcb at 3b5f80 panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 panic messages: I would need a traceback from here. It looks like someone called msleep or tsleep with sched lock held. OK, I think I've found the problem, patch attached. set_user_ldt is called from cpu_switch on i386, where the sched lock is already held by the process that is just being scheduled away, and curproc has already been changed, so this isn't treated like a recursed mutex, but rather like the new process (dead-) locking against the old one. The solution taken in the attached patch create a set_user_ldt_nolock. This way, we have a more or less consistent enviroment (of the new process) there. The (pcb != PCPU_GET(curpcb)) check is in the outer locking set_user_ldt wrapper (it seems only to be needed in the smp rendezvous case and is a can't happen when called from cpu_switch). This works for me; Doug, could you please test it too? I'd be thankful for any review. - thomas Index: i386/swtch.s === RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v retrieving revision 1.114 diff -u -r1.114 swtch.s --- i386/swtch.s2001/05/20 16:51:08 1.114 +++ i386/swtch.s2001/05/29 22:09:14 @@ -248,7 +248,7 @@ movl%eax,PCPU(CURRENTLDT) jmp 2f 1: pushl %edx - callset_user_ldt + callset_user_ldt_nolock popl%edx 2: Index: i386/sys_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/sys_machdep.c,v retrieving revision 1.57 diff -u -r1.57 sys_machdep.c --- i386/sys_machdep.c 2001/05/15 23:22:20 1.57 +++ i386/sys_machdep.c 2001/05/29 22:24:04 @@ -239,17 +239,16 @@ /* * Update the GDT entry pointing to the LDT to point to the LDT of the - * current process. + * current process. Assumes that sched_lock is held. This is needed + * in cpu_switch because sched_lock is held by the process that has + * just been scheduled away and we would deadlock if we would try to + * acquire sched_lock. */ void -set_user_ldt(struct pcb *pcb) +set_user_ldt_nolock(struct pcb *pcb) { struct pcb_ldt *pcb_ldt; - if (pcb != PCPU_GET(curpcb)) - return; - - mtx_lock_spin(sched_lock); pcb_ldt = pcb-pcb_ldt; #ifdef SMP gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pcb_ldt-ldt_sd; @@ -258,6 +257,17 @@ #endif lldt(GSEL(GUSERLDT_SEL, SEL_KPL)); PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL)); +} + +/* Locking wrapper of the above */ +void +set_user_ldt(struct pcb *pcb) +{ + if (pcb != PCPU_GET(curpcb)) + return; + + mtx_lock_spin(sched_lock); + set_user_ldt_nolock(pcb); mtx_unlock_spin(sched_lock); } Index: include/pcb_ext.h === RCS file: /home/ncvs/src/sys/i386/include/pcb_ext.h,v retrieving revision 1.6 diff -u -r1.6 pcb_ext.h --- include/pcb_ext.h 2001/05/10 17:03:03 1.6 +++ include/pcb_ext.h 2001/05/29 22:06:37 @@ -55,6 +55,7 @@ int i386_extend_pcb __P((struct proc *)); void set_user_ldt __P((struct pcb *)); +void set_user_ldt_nolock __P((struct pcb *)); struct pcb_ldt *user_ldt_alloc __P((struct pcb *, int)); void user_ldt_free __P((struct pcb *));
RE: -current is _definitely_ not stable right now
On 28-May-01 Doug Barton wrote: Gang, On the avi front, typing 'aviplay' with or without an argument is guaranteed to instantly wedge the box. I attached a lot of running aviplay through truss, but I have no way to know if it stopped at or before the offending instruction. As for the general wonkiness of the system, I have finally gotten a dump. The backtrace is below, let me know if there is anything else I can do to help debug. Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. Doug -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
I forgot something: IdlePTD 4734976 initial pcb at 3b5f80 panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 panic messages: --- panic: blockable sleep lock (sx) allproc @ /usr/src/sys/kern/kern_proc.c:146 syncing disks... 90 90 panic: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 A quick look at that file indicates that rwatson is probably off the hook, since he hadn't touched it. -- I need someone really bad. Are you really bad? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current is _definitely_ not stable right now
I haven't made any mutex commits -- my commits were credential-related. At least two bugs have popped up and been resolved since the pcred removal commits, including: 1) Bug in execve() such that saved uids/gids were not being done in some situations. 2) Bug in crfree() such that there was a reference leak for uidinfo structures. I wouldn't be surprised if a couple more turned up. Robert N M Watson FreeBSD Core Team, TrustedBSD Project [EMAIL PROTECTED] NAI Labs, Safeport Network Services On Sun, 27 May 2001, Doug Barton wrote: Gang, I cvs'ed and built world/kernel shortly after jhb's all clear on thursday, and things went fairly well. I did the same again after rwatson's mutex commits on friday and things have gone downhill from there. Just about any heavy system activity locks the system up. That includes things like building large ports (for example, avifile), buildworld, and trying to actually run aviplay. On the avi front, typing 'aviplay' with or without an argument is guaranteed to instantly wedge the box. I attached a lot of running aviplay through truss, but I have no way to know if it stopped at or before the offending instruction. As for the general wonkiness of the system, I have finally gotten a dump. The backtrace is below, let me know if there is anything else I can do to help debug. Doug (kgdb) where #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:478 #1 0xc01cb318 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:321 #2 0xc01cb745 in panic (fmt=0xc0330ce4 mutex %s recursed at %s:%d) at /usr/src/sys/kern/kern_shutdown.c:600 #3 0xc01c3c9c in _mtx_assert (m=0xc03f44a0, what=9, file=0xc0332360 /usr/src/sys/kern/kern_synch.c, line=858) at /usr/src/sys/kern/kern_mutex.c:571 #4 0xc01d4b9d in mi_switch () at /usr/src/sys/kern/kern_synch.c:858 #5 0xc01cb01c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:262 #6 0xc01cb745 in panic (fmt=0xc0334760 blockable sleep lock (%s) %s @ %s:%d) at /usr/src/sys/kern/kern_shutdown.c:600 #7 0xc01e60a0 in witness_lock (lock=0xc03f0f60, flags=0, file=0xc0331123 /usr/src/sys/kern/kern_proc.c, line=146) at /usr/src/sys/kern/subr_witness.c:489 #8 0xc01d2285 in _sx_slock (sx=0xc03f0f60, file=0xc0331123 /usr/src/sys/kern/kern_proc.c, line=146) at /usr/src/sys/kern/kern_sx.c:114 #9 0xc01c4e2c in pfind (pid=434) at /usr/src/sys/kern/kern_proc.c:146 #10 0xc01ea3c6 in selwakeup (sip=0xc0e3d404) at /usr/src/sys/kern/sys_generic.c:1175 #11 0xc01f5c5f in ptcwakeup (tp=0xc0e3d420, flag=1) at /usr/src/sys/kern/tty_pty.c:317 #12 0xc01f5c36 in ptsstart (tp=0xc0e3d420) at /usr/src/sys/kern/tty_pty.c:306 #13 0xc01f3074 in ttstart (tp=0xc0e3d420) at /usr/src/sys/kern/tty.c:1409 #14 0xc01f4685 in tputchar (c=107, tp=0xc0e3d420) at /usr/src/sys/kern/tty.c:2458 #15 0xc01e20cb in putchar (c=107, arg=0xcd115de8) at /usr/src/sys/kern/subr_prf.c:304 #16 0xc01e234a in kvprintf (fmt=0xc034f881 ernel trap %d with interrupts disabled\n, func=0xc01e207c putchar, arg=0xcd115de8, radix=10, ap=0xcd115e00 \f) at /usr/src/sys/kern/subr_prf.c:487 #17 0xc01e1ff8 in printf (fmt=0xc034f880 kernel trap %d with interrupts disabled\n) at /usr/src/sys/kern/subr_prf.c:260 #18 0xc02f6955 in trap (frame={tf_fs = -854523880, tf_es = -1071775728, tf_ds = -855048176, tf_edi = 4, tf_esi = -1058806500, tf_ebp = -854499712, tf_isp = -854499744, tf_ebx = -855029664, tf_edx = -559038242, tf_ecx = 2, tf_eax = -559038244, tf_trapno = 12, tf_err = 0, tf_eip = -1071892410, tf_cs = 8, tf_eflags = 65670, tf_esp = -1052624640, tf_ss = -1058806528}) at /usr/src/sys/i386/i386/trap.c:253 #19 0xc01c3846 in _mtx_lock_sleep (m=0xc0e3e51c, opts=0, file=0xc0331500 /usr/src/sys/kern/kern_resource.c, line=793) at /usr/src/sys/kern/kern_mutex.c:380 #20 0xc01ca0cb in uihold (uip=0xc0e3e500) at /usr/src/sys/kern/kern_resource.c:793 #21 0xc01c86f9 in crdup (cr=0xc1423900) at /usr/src/sys/kern/kern_prot.c:1349 #22 0xc021cf8c in access (p=0xcd094860, uap=0xcd115f80) at /usr/src/sys/kern/vfs_syscalls.c:1712 #23 0xc02f841d in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134665044, tf_esi = 134676528, tf_ebp = -1077940088, tf_isp = -854499372, tf_ebx = 134661184, tf_edx = 134665044, tf_ecx = 134661218, tf_eax = 33, tf_trapno = 12, tf_err = 2, tf_eip = 134555356, tf_cs = 31, tf_eflags = 643, tf_esp = -1077940132, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1172 #24 0xc02e957d in syscall_with_err_pushed () #25 0x804a131 in ?? () #26 0x804caa1 in ?? () #27 0x804e57c in ?? () #28 0x804dd54 in ?? () #29 0x804e57c in ?? () #30 0x804dd54 in ?? () #31 0x804e57c in ?? () #32 0x804dd54 in ?? () #33 0x804e57c in ?? () #34 0x804dd54 in ?? () #35 0x804c880 in ?? () #36 0x804fd4a in ?? () #37 0x8048131 in ?? () To Unsubscribe: