Re: -current is _definitely_ not stable right now

2001-05-30 Thread Doug Barton

John Baldwin wrote:
 
 On 28-May-01 Doug Barton wrote:
  Gang,
 
On the avi front, typing 'aviplay' with or without an argument is
  guaranteed to instantly wedge the box. I attached a lot of running aviplay
  through truss, but I have no way to know if it stopped at or before the
  offending instruction. As for the general wonkiness of the system, I have
  finally gotten a dump. The backtrace is below, let me know if there is
  anything else I can do to help debug.
 
 Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.

This worked excellently for me! I patched the kernel and rebuilt, then
tested aviplay... success. Then I cvsup'ed, built/installed world and
kernel, and started stress testing. I'm currently running two builds of X
4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
(ok, it's a little choppy, but still runs). I'd say it's probably safe to
go back in the water again. Next stop, re-enabling softupdates. :)

BTW, I'm probably wrong about this but looking at the patch it seems odd
to me that one of these is ifndef and the other is ifdef:

@@ -422,15 +433,21 @@
kmem_free(kernel_map, (vm_offset_t)old_ldt_base,
old_ldt_len * sizeof(union descriptor));
FREE(new_ldt, M_SUBPROC);
+#ifndef SMP
+   mtx_lock_spin(sched_lock);
+#endif
} else {
pcb-pcb_ldt = pcb_ldt = new_ldt;
+#ifdef SMP
mtx_unlock_spin(sched_lock);
+#endif
}

Just curious,

Doug (Thanks BTW)
-- 
I need someone really bad. Are you really bad?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-30 Thread Doug Barton

Doug Barton wrote:

 This worked excellently for me! I patched the kernel and rebuilt, then
 tested aviplay... success. Then I cvsup'ed, built/installed world and
 kernel, and started stress testing. I'm currently running two builds of X
 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
 (ok, it's a little choppy, but still runs).


 I forgot to mention explicitly, this is all running in X. :)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-30 Thread John Baldwin


On 30-May-01 Doug Barton wrote:
 John Baldwin wrote:
 
 On 28-May-01 Doug Barton wrote:
  Gang,
 
On the avi front, typing 'aviplay' with or without an argument is
  guaranteed to instantly wedge the box. I attached a lot of running aviplay
  through truss, but I have no way to know if it stopped at or before the
  offending instruction. As for the general wonkiness of the system, I have
  finally gotten a dump. The backtrace is below, let me know if there is
  anything else I can do to help debug.
 
 Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.
 
   This worked excellently for me! I patched the kernel and rebuilt, then
 tested aviplay... success. Then I cvsup'ed, built/installed world and
 kernel, and started stress testing. I'm currently running two builds of X
 4, one over NFS and one local, 'make cleandir' in /usr/src, AND avifile
 (ok, it's a little choppy, but still runs). I'd say it's probably safe to
 go back in the water again. Next stop, re-enabling softupdates. :)

Sounds good, I'll commit it in a second.

   BTW, I'm probably wrong about this but looking at the patch it seems odd
 to me that one of these is ifndef and the other is ifdef:
 
 @@ -422,15 +433,21 @@
 kmem_free(kernel_map, (vm_offset_t)old_ldt_base,
 old_ldt_len * sizeof(union descriptor));
 FREE(new_ldt, M_SUBPROC);
 +#ifndef SMP
 +   mtx_lock_spin(sched_lock);
 +#endif
 } else {
 pcb-pcb_ldt = pcb_ldt = new_ldt;
 +#ifdef SMP
 mtx_unlock_spin(sched_lock);
 +#endif
 }
 
 Just curious,

We won't to continue to hold the sched_lock before calling set_user_ldt in the
!SMP case, but we want to release it in the SMP case before calling the smp
rendezvous.  We have to release the sched_lock in the then clause before
calling kmem_free, so each clause finishes with the sched_lock in a different
state.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-29 Thread John Baldwin


On 28-May-01 Doug Barton wrote:
 I forgot something:
 
 IdlePTD 4734976
 initial pcb at 3b5f80
 panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
 panic messages:

I would need a traceback from here.  It looks like someone called msleep or
tsleep with sched lock held.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-29 Thread Thomas Moestl

On Tue, 2001/05/29 at 09:39:42 -0700, John Baldwin wrote:
 
 On 28-May-01 Doug Barton wrote:
  I forgot something:
  
  IdlePTD 4734976
  initial pcb at 3b5f80
  panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
  panic messages:
 
 I would need a traceback from here.  It looks like someone called msleep or
 tsleep with sched lock held.

OK, I think I've found the problem, patch attached. set_user_ldt is
called from cpu_switch on i386, where the sched lock is already held
by the process that is just being scheduled away, and curproc has
already been changed, so this isn't treated like a recursed mutex, but
rather like the new process (dead-) locking against the old one.

The solution taken in the attached patch create a
set_user_ldt_nolock. This way, we have a more or less consistent
enviroment (of the new process) there.
The (pcb != PCPU_GET(curpcb)) check is in the outer locking
set_user_ldt wrapper (it seems only to be needed in the smp rendezvous
case and is a can't happen when called from cpu_switch).

This works for me; Doug, could you please test it too? I'd be thankful
for any review.

- thomas


Index: i386/swtch.s
===
RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v
retrieving revision 1.114
diff -u -r1.114 swtch.s
--- i386/swtch.s2001/05/20 16:51:08 1.114
+++ i386/swtch.s2001/05/29 22:09:14
@@ -248,7 +248,7 @@
movl%eax,PCPU(CURRENTLDT)
jmp 2f
 1: pushl   %edx
-   callset_user_ldt
+   callset_user_ldt_nolock
popl%edx
 2:
 
Index: i386/sys_machdep.c
===
RCS file: /home/ncvs/src/sys/i386/i386/sys_machdep.c,v
retrieving revision 1.57
diff -u -r1.57 sys_machdep.c
--- i386/sys_machdep.c  2001/05/15 23:22:20 1.57
+++ i386/sys_machdep.c  2001/05/29 22:24:04
@@ -239,17 +239,16 @@
 
 /*
  * Update the GDT entry pointing to the LDT to point to the LDT of the
- * current process.
+ * current process. Assumes that sched_lock is held. This is needed
+ * in cpu_switch because sched_lock is held by the process that has
+ * just been scheduled away and we would deadlock if we would try to
+ * acquire sched_lock.
  */   
 void
-set_user_ldt(struct pcb *pcb)
+set_user_ldt_nolock(struct pcb *pcb)
 {
struct pcb_ldt *pcb_ldt;
 
-   if (pcb != PCPU_GET(curpcb))
-   return;
-
-   mtx_lock_spin(sched_lock);
pcb_ldt = pcb-pcb_ldt;
 #ifdef SMP
gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pcb_ldt-ldt_sd;
@@ -258,6 +257,17 @@
 #endif
lldt(GSEL(GUSERLDT_SEL, SEL_KPL));
PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL));
+}
+
+/* Locking wrapper of the above */
+void
+set_user_ldt(struct pcb *pcb)
+{
+   if (pcb != PCPU_GET(curpcb))
+   return;
+
+   mtx_lock_spin(sched_lock);
+   set_user_ldt_nolock(pcb);
mtx_unlock_spin(sched_lock);
 }
 
Index: include/pcb_ext.h
===
RCS file: /home/ncvs/src/sys/i386/include/pcb_ext.h,v
retrieving revision 1.6
diff -u -r1.6 pcb_ext.h
--- include/pcb_ext.h   2001/05/10 17:03:03 1.6
+++ include/pcb_ext.h   2001/05/29 22:06:37
@@ -55,6 +55,7 @@
 
 int i386_extend_pcb __P((struct proc *));
 void set_user_ldt __P((struct pcb *));
+void set_user_ldt_nolock __P((struct pcb *));
 struct pcb_ldt *user_ldt_alloc __P((struct pcb *, int));
 void user_ldt_free __P((struct pcb *));
 



RE: -current is _definitely_ not stable right now

2001-05-29 Thread John Baldwin


On 28-May-01 Doug Barton wrote:
 Gang,
 
   On the avi front, typing 'aviplay' with or without an argument is
 guaranteed to instantly wedge the box. I attached a lot of running aviplay
 through truss, but I have no way to know if it stopped at or before the
 offending instruction. As for the general wonkiness of the system, I have
 finally gotten a dump. The backtrace is below, let me know if there is
 anything else I can do to help debug. 

Please try http://www.FreeBSD.org/~jhb/patches/ldt.patch.

 Doug

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-28 Thread Doug Barton

I forgot something:

IdlePTD 4734976
initial pcb at 3b5f80
panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858
panic messages:
---
panic: blockable sleep lock (sx) allproc @
/usr/src/sys/kern/kern_proc.c:146

syncing disks... 90 90 panic: mutex sched lock recursed at
/usr/src/sys/kern/kern_synch.c:858

A quick look at that file indicates that rwatson is probably off the hook,
since he hadn't touched it. 


-- 
I need someone really bad. Are you really bad?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -current is _definitely_ not stable right now

2001-05-28 Thread Robert Watson


I haven't made any mutex commits -- my commits were credential-related.
At least two bugs have popped up and been resolved since the pcred removal
commits, including:

1) Bug in execve() such that saved uids/gids were not being done in some
   situations.
2) Bug in crfree() such that there was a reference leak for uidinfo
   structures.

I wouldn't be surprised if a couple more turned up.

Robert N M Watson FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED]  NAI Labs, Safeport Network Services

On Sun, 27 May 2001, Doug Barton wrote:

 Gang,
 
   I cvs'ed and built world/kernel shortly after jhb's all clear on
 thursday, and things went fairly well. I did the same again after rwatson's
 mutex commits on friday and things have gone downhill from there. Just
 about any heavy system activity locks the system up. That includes things
 like building large ports (for example, avifile), buildworld, and trying to
 actually run aviplay. 
 
   On the avi front, typing 'aviplay' with or without an argument is
 guaranteed to instantly wedge the box. I attached a lot of running aviplay
 through truss, but I have no way to know if it stopped at or before the
 offending instruction. As for the general wonkiness of the system, I have
 finally gotten a dump. The backtrace is below, let me know if there is
 anything else I can do to help debug. 
 
 Doug
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 (kgdb) where
 #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:478
 #1  0xc01cb318 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:321
 #2  0xc01cb745 in panic (fmt=0xc0330ce4 mutex %s recursed at %s:%d)
 at /usr/src/sys/kern/kern_shutdown.c:600
 #3  0xc01c3c9c in _mtx_assert (m=0xc03f44a0, what=9, 
 file=0xc0332360 /usr/src/sys/kern/kern_synch.c, line=858)
 at /usr/src/sys/kern/kern_mutex.c:571
 #4  0xc01d4b9d in mi_switch () at /usr/src/sys/kern/kern_synch.c:858
 #5  0xc01cb01c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:262
 #6  0xc01cb745 in panic (fmt=0xc0334760 blockable sleep lock (%s) %s @
 %s:%d)
 at /usr/src/sys/kern/kern_shutdown.c:600
 #7  0xc01e60a0 in witness_lock (lock=0xc03f0f60, flags=0, 
 file=0xc0331123 /usr/src/sys/kern/kern_proc.c, line=146)
 at /usr/src/sys/kern/subr_witness.c:489
 #8  0xc01d2285 in _sx_slock (sx=0xc03f0f60, file=0xc0331123
 /usr/src/sys/kern/kern_proc.c, 
 line=146) at /usr/src/sys/kern/kern_sx.c:114
 #9  0xc01c4e2c in pfind (pid=434) at /usr/src/sys/kern/kern_proc.c:146
 #10 0xc01ea3c6 in selwakeup (sip=0xc0e3d404) at
 /usr/src/sys/kern/sys_generic.c:1175
 #11 0xc01f5c5f in ptcwakeup (tp=0xc0e3d420, flag=1) at
 /usr/src/sys/kern/tty_pty.c:317
 #12 0xc01f5c36 in ptsstart (tp=0xc0e3d420) at
 /usr/src/sys/kern/tty_pty.c:306
 #13 0xc01f3074 in ttstart (tp=0xc0e3d420) at /usr/src/sys/kern/tty.c:1409
 #14 0xc01f4685 in tputchar (c=107, tp=0xc0e3d420) at
 /usr/src/sys/kern/tty.c:2458
 #15 0xc01e20cb in putchar (c=107, arg=0xcd115de8) at
 /usr/src/sys/kern/subr_prf.c:304
 #16 0xc01e234a in kvprintf (fmt=0xc034f881 ernel trap %d with interrupts
 disabled\n, 
 func=0xc01e207c putchar, arg=0xcd115de8, radix=10, ap=0xcd115e00
 \f)
 at /usr/src/sys/kern/subr_prf.c:487
 #17 0xc01e1ff8 in printf (fmt=0xc034f880 kernel trap %d with interrupts
 disabled\n)
 at /usr/src/sys/kern/subr_prf.c:260
 #18 0xc02f6955 in trap (frame={tf_fs = -854523880, tf_es = -1071775728,
 tf_ds = -855048176, 
   tf_edi = 4, tf_esi = -1058806500, tf_ebp = -854499712, tf_isp =
 -854499744, 
   tf_ebx = -855029664, tf_edx = -559038242, tf_ecx = 2, tf_eax =
 -559038244, 
   tf_trapno = 12, tf_err = 0, tf_eip = -1071892410, tf_cs = 8,
 tf_eflags = 65670, 
   tf_esp = -1052624640, tf_ss = -1058806528}) at
 /usr/src/sys/i386/i386/trap.c:253
 #19 0xc01c3846 in _mtx_lock_sleep (m=0xc0e3e51c, opts=0, 
 file=0xc0331500 /usr/src/sys/kern/kern_resource.c, line=793)
 at /usr/src/sys/kern/kern_mutex.c:380
 #20 0xc01ca0cb in uihold (uip=0xc0e3e500) at
 /usr/src/sys/kern/kern_resource.c:793
 #21 0xc01c86f9 in crdup (cr=0xc1423900) at
 /usr/src/sys/kern/kern_prot.c:1349
 #22 0xc021cf8c in access (p=0xcd094860, uap=0xcd115f80)
 at /usr/src/sys/kern/vfs_syscalls.c:1712
 #23 0xc02f841d in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
 tf_edi = 134665044, 
   tf_esi = 134676528, tf_ebp = -1077940088, tf_isp = -854499372, tf_ebx
 = 134661184, 
   tf_edx = 134665044, tf_ecx = 134661218, tf_eax = 33, tf_trapno = 12,
 tf_err = 2, 
   tf_eip = 134555356, tf_cs = 31, tf_eflags = 643, tf_esp =
 -1077940132, tf_ss = 47})
 at /usr/src/sys/i386/i386/trap.c:1172
 #24 0xc02e957d in syscall_with_err_pushed ()
 #25 0x804a131 in ?? ()
 #26 0x804caa1 in ?? ()
 #27 0x804e57c in ?? ()
 #28 0x804dd54 in ?? ()
 #29 0x804e57c in ?? ()
 #30 0x804dd54 in ?? ()
 #31 0x804e57c in ?? ()
 #32 0x804dd54 in ?? ()
 #33 0x804e57c in ?? ()
 #34 0x804dd54 in ?? ()
 #35 0x804c880 in ?? ()
 #36 0x804fd4a in ?? ()
 #37 0x8048131 in ?? ()


To Unsubscribe: