Re: -current is _definitely_ not stable right now
On 30-May-01 Doug Barton wrote: > John Baldwin wrote: >> >> On 28-May-01 Doug Barton wrote: >> > Gang, >> > >> > On the avi front, typing 'aviplay' with or without an argument is >> > guaranteed to instantly wedge the box. I attached a lot of running aviplay >> > through truss, but I have no way to know if it stopped at or before the >> > offending instruction. As for the general wonkiness of the system, I have >> > finally gotten a dump. The backtrace is below, let me know if there is >> > anything else I can do to help debug. >> >> Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. > > This worked excellently for me! I patched the kernel and rebuilt, then > tested aviplay... success. Then I cvsup'ed, built/installed world and > kernel, and started stress testing. I'm currently running two builds of X > 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile > (ok, it's a little choppy, but still runs). I'd say it's probably safe to > go back in the water again. Next stop, re-enabling softupdates. :) Sounds good, I'll commit it in a second. > BTW, I'm probably wrong about this but looking at the patch it seems odd > to me that one of these is ifndef and the other is ifdef: > > @@ -422,15 +433,21 @@ > kmem_free(kernel_map, (vm_offset_t)old_ldt_base, > old_ldt_len * sizeof(union descriptor)); > FREE(new_ldt, M_SUBPROC); > +#ifndef SMP > + mtx_lock_spin(&sched_lock); > +#endif > } else { > pcb->pcb_ldt = pcb_ldt = new_ldt; > +#ifdef SMP > mtx_unlock_spin(&sched_lock); > +#endif > } > > Just curious, We won't to continue to hold the sched_lock before calling set_user_ldt in the !SMP case, but we want to release it in the SMP case before calling the smp rendezvous. We have to release the sched_lock in the then clause before calling kmem_free, so each clause finishes with the sched_lock in a different state. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: -current is _definitely_ not stable right now
Doug Barton wrote: > This worked excellently for me! I patched the kernel and rebuilt, then > tested aviplay... success. Then I cvsup'ed, built/installed world and > kernel, and started stress testing. I'm currently running two builds of X > 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile > (ok, it's a little choppy, but still runs). I forgot to mention explicitly, this is all running in X. :) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: -current is _definitely_ not stable right now
John Baldwin wrote: > > On 28-May-01 Doug Barton wrote: > > Gang, > > > > On the avi front, typing 'aviplay' with or without an argument is > > guaranteed to instantly wedge the box. I attached a lot of running aviplay > > through truss, but I have no way to know if it stopped at or before the > > offending instruction. As for the general wonkiness of the system, I have > > finally gotten a dump. The backtrace is below, let me know if there is > > anything else I can do to help debug. > > Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. This worked excellently for me! I patched the kernel and rebuilt, then tested aviplay... success. Then I cvsup'ed, built/installed world and kernel, and started stress testing. I'm currently running two builds of X 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile (ok, it's a little choppy, but still runs). I'd say it's probably safe to go back in the water again. Next stop, re-enabling softupdates. :) BTW, I'm probably wrong about this but looking at the patch it seems odd to me that one of these is ifndef and the other is ifdef: @@ -422,15 +433,21 @@ kmem_free(kernel_map, (vm_offset_t)old_ldt_base, old_ldt_len * sizeof(union descriptor)); FREE(new_ldt, M_SUBPROC); +#ifndef SMP + mtx_lock_spin(&sched_lock); +#endif } else { pcb->pcb_ldt = pcb_ldt = new_ldt; +#ifdef SMP mtx_unlock_spin(&sched_lock); +#endif } Just curious, Doug (Thanks BTW) -- I need someone really bad. Are you really bad? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: -current is _definitely_ not stable right now
On 28-May-01 Doug Barton wrote: > Gang, > > On the avi front, typing 'aviplay' with or without an argument is > guaranteed to instantly wedge the box. I attached a lot of running aviplay > through truss, but I have no way to know if it stopped at or before the > offending instruction. As for the general wonkiness of the system, I have > finally gotten a dump. The backtrace is below, let me know if there is > anything else I can do to help debug. Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch. > Doug -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: -current is _definitely_ not stable right now
On Tue, 2001/05/29 at 09:39:42 -0700, John Baldwin wrote: > > On 28-May-01 Doug Barton wrote: > > I forgot something: > > > > IdlePTD 4734976 > > initial pcb at 3b5f80 > > panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 > > panic messages: > > I would need a traceback from here. It looks like someone called msleep or > tsleep with sched lock held. OK, I think I've found the problem, patch attached. set_user_ldt is called from cpu_switch on i386, where the sched lock is already held by the process that is just being scheduled away, and curproc has already been changed, so this isn't treated like a recursed mutex, but rather like the new process (dead-) locking against the old one. The solution taken in the attached patch create a set_user_ldt_nolock. This way, we have a more or less consistent enviroment (of the new process) there. The (pcb != PCPU_GET(curpcb)) check is in the outer locking set_user_ldt wrapper (it seems only to be needed in the smp rendezvous case and is a "can't happen" when called from cpu_switch). This works for me; Doug, could you please test it too? I'd be thankful for any review. - thomas Index: i386/swtch.s === RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v retrieving revision 1.114 diff -u -r1.114 swtch.s --- i386/swtch.s2001/05/20 16:51:08 1.114 +++ i386/swtch.s2001/05/29 22:09:14 @@ -248,7 +248,7 @@ movl%eax,PCPU(CURRENTLDT) jmp 2f 1: pushl %edx - callset_user_ldt + callset_user_ldt_nolock popl%edx 2: Index: i386/sys_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/sys_machdep.c,v retrieving revision 1.57 diff -u -r1.57 sys_machdep.c --- i386/sys_machdep.c 2001/05/15 23:22:20 1.57 +++ i386/sys_machdep.c 2001/05/29 22:24:04 @@ -239,17 +239,16 @@ /* * Update the GDT entry pointing to the LDT to point to the LDT of the - * current process. + * current process. Assumes that sched_lock is held. This is needed + * in cpu_switch because sched_lock is held by the process that has + * just been scheduled away and we would deadlock if we would try to + * acquire sched_lock. */ void -set_user_ldt(struct pcb *pcb) +set_user_ldt_nolock(struct pcb *pcb) { struct pcb_ldt *pcb_ldt; - if (pcb != PCPU_GET(curpcb)) - return; - - mtx_lock_spin(&sched_lock); pcb_ldt = pcb->pcb_ldt; #ifdef SMP gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pcb_ldt->ldt_sd; @@ -258,6 +257,17 @@ #endif lldt(GSEL(GUSERLDT_SEL, SEL_KPL)); PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL)); +} + +/* Locking wrapper of the above */ +void +set_user_ldt(struct pcb *pcb) +{ + if (pcb != PCPU_GET(curpcb)) + return; + + mtx_lock_spin(&sched_lock); + set_user_ldt_nolock(pcb); mtx_unlock_spin(&sched_lock); } Index: include/pcb_ext.h === RCS file: /home/ncvs/src/sys/i386/include/pcb_ext.h,v retrieving revision 1.6 diff -u -r1.6 pcb_ext.h --- include/pcb_ext.h 2001/05/10 17:03:03 1.6 +++ include/pcb_ext.h 2001/05/29 22:06:37 @@ -55,6 +55,7 @@ int i386_extend_pcb __P((struct proc *)); void set_user_ldt __P((struct pcb *)); +void set_user_ldt_nolock __P((struct pcb *)); struct pcb_ldt *user_ldt_alloc __P((struct pcb *, int)); void user_ldt_free __P((struct pcb *));
Re: -current is _definitely_ not stable right now
On 28-May-01 Doug Barton wrote: > I forgot something: > > IdlePTD 4734976 > initial pcb at 3b5f80 > panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 > panic messages: I would need a traceback from here. It looks like someone called msleep or tsleep with sched lock held. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: -current is _definitely_ not stable right now
I haven't made any mutex commits -- my commits were credential-related. At least two bugs have popped up and been resolved since the pcred removal commits, including: 1) Bug in execve() such that saved uids/gids were not being done in some situations. 2) Bug in crfree() such that there was a reference leak for uidinfo structures. I wouldn't be surprised if a couple more turned up. Robert N M Watson FreeBSD Core Team, TrustedBSD Project [EMAIL PROTECTED] NAI Labs, Safeport Network Services On Sun, 27 May 2001, Doug Barton wrote: > Gang, > > I cvs'ed and built world/kernel shortly after jhb's "all clear" on > thursday, and things went fairly well. I did the same again after rwatson's > mutex commits on friday and things have gone downhill from there. Just > about any heavy system activity locks the system up. That includes things > like building large ports (for example, avifile), buildworld, and trying to > actually run aviplay. > > On the avi front, typing 'aviplay' with or without an argument is > guaranteed to instantly wedge the box. I attached a lot of running aviplay > through truss, but I have no way to know if it stopped at or before the > offending instruction. As for the general wonkiness of the system, I have > finally gotten a dump. The backtrace is below, let me know if there is > anything else I can do to help debug. > > Doug > > > > > > > > > > > > > > > (kgdb) where > #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:478 > #1 0xc01cb318 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:321 > #2 0xc01cb745 in panic (fmt=0xc0330ce4 "mutex %s recursed at %s:%d") > at /usr/src/sys/kern/kern_shutdown.c:600 > #3 0xc01c3c9c in _mtx_assert (m=0xc03f44a0, what=9, > file=0xc0332360 "/usr/src/sys/kern/kern_synch.c", line=858) > at /usr/src/sys/kern/kern_mutex.c:571 > #4 0xc01d4b9d in mi_switch () at /usr/src/sys/kern/kern_synch.c:858 > #5 0xc01cb01c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:262 > #6 0xc01cb745 in panic (fmt=0xc0334760 "blockable sleep lock (%s) %s @ > %s:%d") > at /usr/src/sys/kern/kern_shutdown.c:600 > #7 0xc01e60a0 in witness_lock (lock=0xc03f0f60, flags=0, > file=0xc0331123 "/usr/src/sys/kern/kern_proc.c", line=146) > at /usr/src/sys/kern/subr_witness.c:489 > #8 0xc01d2285 in _sx_slock (sx=0xc03f0f60, file=0xc0331123 > "/usr/src/sys/kern/kern_proc.c", > line=146) at /usr/src/sys/kern/kern_sx.c:114 > #9 0xc01c4e2c in pfind (pid=434) at /usr/src/sys/kern/kern_proc.c:146 > #10 0xc01ea3c6 in selwakeup (sip=0xc0e3d404) at > /usr/src/sys/kern/sys_generic.c:1175 > #11 0xc01f5c5f in ptcwakeup (tp=0xc0e3d420, flag=1) at > /usr/src/sys/kern/tty_pty.c:317 > #12 0xc01f5c36 in ptsstart (tp=0xc0e3d420) at > /usr/src/sys/kern/tty_pty.c:306 > #13 0xc01f3074 in ttstart (tp=0xc0e3d420) at /usr/src/sys/kern/tty.c:1409 > #14 0xc01f4685 in tputchar (c=107, tp=0xc0e3d420) at > /usr/src/sys/kern/tty.c:2458 > #15 0xc01e20cb in putchar (c=107, arg=0xcd115de8) at > /usr/src/sys/kern/subr_prf.c:304 > #16 0xc01e234a in kvprintf (fmt=0xc034f881 "ernel trap %d with interrupts > disabled\n", > func=0xc01e207c , arg=0xcd115de8, radix=10, ap=0xcd115e00 > "\f") > at /usr/src/sys/kern/subr_prf.c:487 > #17 0xc01e1ff8 in printf (fmt=0xc034f880 "kernel trap %d with interrupts > disabled\n") > at /usr/src/sys/kern/subr_prf.c:260 > #18 0xc02f6955 in trap (frame={tf_fs = -854523880, tf_es = -1071775728, > tf_ds = -855048176, > tf_edi = 4, tf_esi = -1058806500, tf_ebp = -854499712, tf_isp = > -854499744, > tf_ebx = -855029664, tf_edx = -559038242, tf_ecx = 2, tf_eax = > -559038244, > tf_trapno = 12, tf_err = 0, tf_eip = -1071892410, tf_cs = 8, > tf_eflags = 65670, > tf_esp = -1052624640, tf_ss = -1058806528}) at > /usr/src/sys/i386/i386/trap.c:253 > #19 0xc01c3846 in _mtx_lock_sleep (m=0xc0e3e51c, opts=0, > file=0xc0331500 "/usr/src/sys/kern/kern_resource.c", line=793) > at /usr/src/sys/kern/kern_mutex.c:380 > #20 0xc01ca0cb in uihold (uip=0xc0e3e500) at > /usr/src/sys/kern/kern_resource.c:793 > #21 0xc01c86f9 in crdup (cr=0xc1423900) at > /usr/src/sys/kern/kern_prot.c:1349 > #22 0xc021cf8c in access (p=0xcd094860, uap=0xcd115f80) > at /usr/src/sys/kern/vfs_syscalls.c:1712 > #23 0xc02f841d in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, > tf_edi = 134665044, > tf_esi = 134676528, tf_ebp = -1077940088, tf_isp = -854499372, tf_ebx > = 134661184, > tf_edx = 134665044, tf_ecx = 134661218, tf_eax = 33, tf_trapno = 12, > tf_err = 2, > tf_eip = 134555356, tf_cs = 31, tf_eflags = 643, tf_esp = > -1077940132, tf_ss = 47}) > at /usr/src/sys/i386/i386/trap.c:1172 > #24 0xc02e957d in syscall_with_err_pushed () > #25 0x804a131 in ?? () > #26 0x804caa1 in ?? () > #27 0x804e57c in ?? () > #28 0x804dd54 in ?? () > #29 0x804e57c in ?? () > #30 0x804dd54 in ?? () > #31 0x804e57c in ?? () > #32 0x804dd54 in ?? () > #33 0x804e57c in ??
Re: -current is _definitely_ not stable right now
I forgot something: IdlePTD 4734976 initial pcb at 3b5f80 panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 panic messages: --- panic: blockable sleep lock (sx) allproc @ /usr/src/sys/kern/kern_proc.c:146 syncing disks... 90 90 panic: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 A quick look at that file indicates that rwatson is probably off the hook, since he hadn't touched it. -- I need someone really bad. Are you really bad? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message