Re: -current is _definitely_ not stable right now

2001-05-30 Thread John Baldwin


On 30-May-01 Doug Barton wrote:
> John Baldwin wrote:
>> 
>> On 28-May-01 Doug Barton wrote:
>> > Gang,
>> >
>> >   On the avi front, typing 'aviplay' with or without an argument is
>> > guaranteed to instantly wedge the box. I attached a lot of running aviplay
>> > through truss, but I have no way to know if it stopped at or before the
>> > offending instruction. As for the general wonkiness of the system, I have
>> > finally gotten a dump. The backtrace is below, let me know if there is
>> > anything else I can do to help debug.
>> 
>> Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.
> 
>   This worked excellently for me! I patched the kernel and rebuilt, then
> tested aviplay... success. Then I cvsup'ed, built/installed world and
> kernel, and started stress testing. I'm currently running two builds of X
> 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
> (ok, it's a little choppy, but still runs). I'd say it's probably safe to
> go back in the water again. Next stop, re-enabling softupdates. :)

Sounds good, I'll commit it in a second.

>   BTW, I'm probably wrong about this but looking at the patch it seems odd
> to me that one of these is ifndef and the other is ifdef:
> 
> @@ -422,15 +433,21 @@
> kmem_free(kernel_map, (vm_offset_t)old_ldt_base,
> old_ldt_len * sizeof(union descriptor));
> FREE(new_ldt, M_SUBPROC);
> +#ifndef SMP
> +   mtx_lock_spin(&sched_lock);
> +#endif
> } else {
> pcb->pcb_ldt = pcb_ldt = new_ldt;
> +#ifdef SMP
> mtx_unlock_spin(&sched_lock);
> +#endif
> }
> 
> Just curious,

We won't to continue to hold the sched_lock before calling set_user_ldt in the
!SMP case, but we want to release it in the SMP case before calling the smp
rendezvous.  We have to release the sched_lock in the then clause before
calling kmem_free, so each clause finishes with the sched_lock in a different
state.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-30 Thread Doug Barton

Doug Barton wrote:

> This worked excellently for me! I patched the kernel and rebuilt, then
> tested aviplay... success. Then I cvsup'ed, built/installed world and
> kernel, and started stress testing. I'm currently running two builds of X
> 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
> (ok, it's a little choppy, but still runs).


 I forgot to mention explicitly, this is all running in X. :)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-30 Thread Doug Barton

John Baldwin wrote:
> 
> On 28-May-01 Doug Barton wrote:
> > Gang,
> >
> >   On the avi front, typing 'aviplay' with or without an argument is
> > guaranteed to instantly wedge the box. I attached a lot of running aviplay
> > through truss, but I have no way to know if it stopped at or before the
> > offending instruction. As for the general wonkiness of the system, I have
> > finally gotten a dump. The backtrace is below, let me know if there is
> > anything else I can do to help debug.
> 
> Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.

This worked excellently for me! I patched the kernel and rebuilt, then
tested aviplay... success. Then I cvsup'ed, built/installed world and
kernel, and started stress testing. I'm currently running two builds of X
4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
(ok, it's a little choppy, but still runs). I'd say it's probably safe to
go back in the water again. Next stop, re-enabling softupdates. :)

BTW, I'm probably wrong about this but looking at the patch it seems odd
to me that one of these is ifndef and the other is ifdef:

@@ -422,15 +433,21 @@
kmem_free(kernel_map, (vm_offset_t)old_ldt_base,
old_ldt_len * sizeof(union descriptor));
FREE(new_ldt, M_SUBPROC);
+#ifndef SMP
+   mtx_lock_spin(&sched_lock);
+#endif
} else {
pcb->pcb_ldt = pcb_ldt = new_ldt;
+#ifdef SMP
mtx_unlock_spin(&sched_lock);
+#endif
}

Just curious,

Doug (Thanks BTW)
-- 
I need someone really bad. Are you really bad?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: -current is _definitely_ not stable right now

2001-05-29 Thread John Baldwin


On 28-May-01 Doug Barton wrote:
> Gang,
> 
>   On the avi front, typing 'aviplay' with or without an argument is
> guaranteed to instantly wedge the box. I attached a lot of running aviplay
> through truss, but I have no way to know if it stopped at or before the
> offending instruction. As for the general wonkiness of the system, I have
> finally gotten a dump. The backtrace is below, let me know if there is
> anything else I can do to help debug. 

Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.

> Doug

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-29 Thread Thomas Moestl

On Tue, 2001/05/29 at 09:39:42 -0700, John Baldwin wrote:
> 
> On 28-May-01 Doug Barton wrote:
> > I forgot something:
> > 
> > IdlePTD 4734976
> > initial pcb at 3b5f80
> > panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
> > panic messages:
> 
> I would need a traceback from here.  It looks like someone called msleep or
> tsleep with sched lock held.

OK, I think I've found the problem, patch attached. set_user_ldt is
called from cpu_switch on i386, where the sched lock is already held
by the process that is just being scheduled away, and curproc has
already been changed, so this isn't treated like a recursed mutex, but
rather like the new process (dead-) locking against the old one.

The solution taken in the attached patch create a
set_user_ldt_nolock. This way, we have a more or less consistent
enviroment (of the new process) there.
The (pcb != PCPU_GET(curpcb)) check is in the outer locking
set_user_ldt wrapper (it seems only to be needed in the smp rendezvous
case and is a "can't happen" when called from cpu_switch).

This works for me; Doug, could you please test it too? I'd be thankful
for any review.

- thomas


Index: i386/swtch.s
===
RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v
retrieving revision 1.114
diff -u -r1.114 swtch.s
--- i386/swtch.s2001/05/20 16:51:08 1.114
+++ i386/swtch.s2001/05/29 22:09:14
@@ -248,7 +248,7 @@
movl%eax,PCPU(CURRENTLDT)
jmp 2f
 1: pushl   %edx
-   callset_user_ldt
+   callset_user_ldt_nolock
popl%edx
 2:
 
Index: i386/sys_machdep.c
===
RCS file: /home/ncvs/src/sys/i386/i386/sys_machdep.c,v
retrieving revision 1.57
diff -u -r1.57 sys_machdep.c
--- i386/sys_machdep.c  2001/05/15 23:22:20 1.57
+++ i386/sys_machdep.c  2001/05/29 22:24:04
@@ -239,17 +239,16 @@
 
 /*
  * Update the GDT entry pointing to the LDT to point to the LDT of the
- * current process.
+ * current process. Assumes that sched_lock is held. This is needed
+ * in cpu_switch because sched_lock is held by the process that has
+ * just been scheduled away and we would deadlock if we would try to
+ * acquire sched_lock.
  */   
 void
-set_user_ldt(struct pcb *pcb)
+set_user_ldt_nolock(struct pcb *pcb)
 {
struct pcb_ldt *pcb_ldt;
 
-   if (pcb != PCPU_GET(curpcb))
-   return;
-
-   mtx_lock_spin(&sched_lock);
pcb_ldt = pcb->pcb_ldt;
 #ifdef SMP
gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pcb_ldt->ldt_sd;
@@ -258,6 +257,17 @@
 #endif
lldt(GSEL(GUSERLDT_SEL, SEL_KPL));
PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL));
+}
+
+/* Locking wrapper of the above */
+void
+set_user_ldt(struct pcb *pcb)
+{
+   if (pcb != PCPU_GET(curpcb))
+   return;
+
+   mtx_lock_spin(&sched_lock);
+   set_user_ldt_nolock(pcb);
mtx_unlock_spin(&sched_lock);
 }
 
Index: include/pcb_ext.h
===
RCS file: /home/ncvs/src/sys/i386/include/pcb_ext.h,v
retrieving revision 1.6
diff -u -r1.6 pcb_ext.h
--- include/pcb_ext.h   2001/05/10 17:03:03 1.6
+++ include/pcb_ext.h   2001/05/29 22:06:37
@@ -55,6 +55,7 @@
 
 int i386_extend_pcb __P((struct proc *));
 void set_user_ldt __P((struct pcb *));
+void set_user_ldt_nolock __P((struct pcb *));
 struct pcb_ldt *user_ldt_alloc __P((struct pcb *, int));
 void user_ldt_free __P((struct pcb *));
 



Re: -current is _definitely_ not stable right now

2001-05-29 Thread John Baldwin


On 28-May-01 Doug Barton wrote:
> I forgot something:
> 
> IdlePTD 4734976
> initial pcb at 3b5f80
> panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
> panic messages:

I would need a traceback from here.  It looks like someone called msleep or
tsleep with sched lock held.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-28 Thread Robert Watson


I haven't made any mutex commits -- my commits were credential-related.
At least two bugs have popped up and been resolved since the pcred removal
commits, including:

1) Bug in execve() such that saved uids/gids were not being done in some
   situations.
2) Bug in crfree() such that there was a reference leak for uidinfo
   structures.

I wouldn't be surprised if a couple more turned up.

Robert N M Watson FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED]  NAI Labs, Safeport Network Services

On Sun, 27 May 2001, Doug Barton wrote:

> Gang,
> 
>   I cvs'ed and built world/kernel shortly after jhb's "all clear" on
> thursday, and things went fairly well. I did the same again after rwatson's
> mutex commits on friday and things have gone downhill from there. Just
> about any heavy system activity locks the system up. That includes things
> like building large ports (for example, avifile), buildworld, and trying to
> actually run aviplay. 
> 
>   On the avi front, typing 'aviplay' with or without an argument is
> guaranteed to instantly wedge the box. I attached a lot of running aviplay
> through truss, but I have no way to know if it stopped at or before the
> offending instruction. As for the general wonkiness of the system, I have
> finally gotten a dump. The backtrace is below, let me know if there is
> anything else I can do to help debug. 
> 
> Doug
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> (kgdb) where
> #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:478
> #1  0xc01cb318 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:321
> #2  0xc01cb745 in panic (fmt=0xc0330ce4 "mutex %s recursed at %s:%d")
> at /usr/src/sys/kern/kern_shutdown.c:600
> #3  0xc01c3c9c in _mtx_assert (m=0xc03f44a0, what=9, 
> file=0xc0332360 "/usr/src/sys/kern/kern_synch.c", line=858)
> at /usr/src/sys/kern/kern_mutex.c:571
> #4  0xc01d4b9d in mi_switch () at /usr/src/sys/kern/kern_synch.c:858
> #5  0xc01cb01c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:262
> #6  0xc01cb745 in panic (fmt=0xc0334760 "blockable sleep lock (%s) %s @
> %s:%d")
> at /usr/src/sys/kern/kern_shutdown.c:600
> #7  0xc01e60a0 in witness_lock (lock=0xc03f0f60, flags=0, 
> file=0xc0331123 "/usr/src/sys/kern/kern_proc.c", line=146)
> at /usr/src/sys/kern/subr_witness.c:489
> #8  0xc01d2285 in _sx_slock (sx=0xc03f0f60, file=0xc0331123
> "/usr/src/sys/kern/kern_proc.c", 
> line=146) at /usr/src/sys/kern/kern_sx.c:114
> #9  0xc01c4e2c in pfind (pid=434) at /usr/src/sys/kern/kern_proc.c:146
> #10 0xc01ea3c6 in selwakeup (sip=0xc0e3d404) at
> /usr/src/sys/kern/sys_generic.c:1175
> #11 0xc01f5c5f in ptcwakeup (tp=0xc0e3d420, flag=1) at
> /usr/src/sys/kern/tty_pty.c:317
> #12 0xc01f5c36 in ptsstart (tp=0xc0e3d420) at
> /usr/src/sys/kern/tty_pty.c:306
> #13 0xc01f3074 in ttstart (tp=0xc0e3d420) at /usr/src/sys/kern/tty.c:1409
> #14 0xc01f4685 in tputchar (c=107, tp=0xc0e3d420) at
> /usr/src/sys/kern/tty.c:2458
> #15 0xc01e20cb in putchar (c=107, arg=0xcd115de8) at
> /usr/src/sys/kern/subr_prf.c:304
> #16 0xc01e234a in kvprintf (fmt=0xc034f881 "ernel trap %d with interrupts
> disabled\n", 
> func=0xc01e207c , arg=0xcd115de8, radix=10, ap=0xcd115e00
> "\f")
> at /usr/src/sys/kern/subr_prf.c:487
> #17 0xc01e1ff8 in printf (fmt=0xc034f880 "kernel trap %d with interrupts
> disabled\n")
> at /usr/src/sys/kern/subr_prf.c:260
> #18 0xc02f6955 in trap (frame={tf_fs = -854523880, tf_es = -1071775728,
> tf_ds = -855048176, 
>   tf_edi = 4, tf_esi = -1058806500, tf_ebp = -854499712, tf_isp =
> -854499744, 
>   tf_ebx = -855029664, tf_edx = -559038242, tf_ecx = 2, tf_eax =
> -559038244, 
>   tf_trapno = 12, tf_err = 0, tf_eip = -1071892410, tf_cs = 8,
> tf_eflags = 65670, 
>   tf_esp = -1052624640, tf_ss = -1058806528}) at
> /usr/src/sys/i386/i386/trap.c:253
> #19 0xc01c3846 in _mtx_lock_sleep (m=0xc0e3e51c, opts=0, 
> file=0xc0331500 "/usr/src/sys/kern/kern_resource.c", line=793)
> at /usr/src/sys/kern/kern_mutex.c:380
> #20 0xc01ca0cb in uihold (uip=0xc0e3e500) at
> /usr/src/sys/kern/kern_resource.c:793
> #21 0xc01c86f9 in crdup (cr=0xc1423900) at
> /usr/src/sys/kern/kern_prot.c:1349
> #22 0xc021cf8c in access (p=0xcd094860, uap=0xcd115f80)
> at /usr/src/sys/kern/vfs_syscalls.c:1712
> #23 0xc02f841d in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
> tf_edi = 134665044, 
>   tf_esi = 134676528, tf_ebp = -1077940088, tf_isp = -854499372, tf_ebx
> = 134661184, 
>   tf_edx = 134665044, tf_ecx = 134661218, tf_eax = 33, tf_trapno = 12,
> tf_err = 2, 
>   tf_eip = 134555356, tf_cs = 31, tf_eflags = 643, tf_esp =
> -1077940132, tf_ss = 47})
> at /usr/src/sys/i386/i386/trap.c:1172
> #24 0xc02e957d in syscall_with_err_pushed ()
> #25 0x804a131 in ?? ()
> #26 0x804caa1 in ?? ()
> #27 0x804e57c in ?? ()
> #28 0x804dd54 in ?? ()
> #29 0x804e57c in ?? ()
> #30 0x804dd54 in ?? ()
> #31 0x804e57c in ?? ()
> #32 0x804dd54 in ?? ()
> #33 0x804e57c in ?? 

Re: -current is _definitely_ not stable right now

2001-05-28 Thread Doug Barton

I forgot something:

IdlePTD 4734976
initial pcb at 3b5f80
panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
panic messages:
---
panic: blockable sleep lock (sx) allproc @
/usr/src/sys/kern/kern_proc.c:146

syncing disks... 90 90 panic: mutex sched lock recursed at
/usr/src/sys/kern/kern_synch.c:858

A quick look at that file indicates that rwatson is probably off the hook,
since he hadn't touched it. 


-- 
I need someone really bad. Are you really bad?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message