Re: lockmgr panic on shutdown

2003-11-01 Thread Doug White
More info.

I think this is actually related to kan's reversion of
src/sys/kern/vfs_default.c.  I'm trying rev 1.88 of that file.

In the meantime, here is the panic message and backtrace. I have a
crashdump if desired.

panic: lockmgr: thread 0xc493be40, not exclusive lock holder 0xc071f320 unlocking

#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
#1  0xc0510884 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:372
#2  0xc0510be7 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
#3  0xc05043ea in lockmgr (lkp=0xc49af428, flags=6, interlkp=0x140,
td=0xc493be40) at /usr/src/sys/kern/kern_lock.c:414
#4  0xc055e91f in vop_stdunlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:299
#5  0xc055e768 in vop_defaultop (ap=0x0) at /usr/src/sys/kern/vfs_default.c:161
#6  0xc061ed88 in ufs_vnoperate (ap=0x0)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:2793
#7  0xc06173d0 in ufs_inactive (ap=0x0) at vnode_if.h:1044
#8  0xc061ed88 in ufs_vnoperate (ap=0x0)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:2793
#9  0xc05686d7 in vput (vp=0xc49af36c) at vnode_if.h:953
#10 0xc0610245 in ffs_sync (mp=0xc47b2800, waitfor=2, cred=0xc1d07e80,
td=0xc071f320) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1169
#11 0xc056ade4 in sync (td=0xc071f320, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:142
#12 0xc05103e0 in boot (howto=8) at /usr/src/sys/kern/kern_shutdown.c:281
#13 0xc0510046 in reboot (td=0x0, uap=0x0)
at /usr/src/sys/kern/kern_shutdown.c:178
#14 0xc066cf32 in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = 0, tf_ebp = 
-1077937104, tf_isp = -574005900, tf_ebx = 2, tf_edx = -1, tf_ecx = 3, tf_eax = 55, 
tf_trapno = 12, tf_err = 2, tf_eip = 134516011, tf_cs = 31, tf_eflags = 582, tf_esp = 
-1077937172, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1012
#15 0xc065d91d in Xint0x80_syscall () at {standard input}:144


On Sat, 1 Nov 2003, Doug White wrote:

 I can confirm the lockmgr panic on shutdown reported by someone else
 earlier (whose message I mistakenly deleted).

 It looks like swapper is trying to undo a lock from pagedaemon and runs
 into trouble. This is probably related to the Giant pushdown of
 vm_pageout() that alc did last week.

 I'm building with INVARIANTS to see if that will catch more info.  Will
 report back soon.



-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread peter . edwards


I can confirm the lockmgr panic on shutdown reported by someone else
earlier (whose message I mistakenly deleted).

It looks like swapper is trying to undo a lock from pagedaemon and runs
into trouble. This is probably related to the Giant pushdown of
vm_pageout() that alc did last week.

I'm building with INVARIANTS to see if that will catch more info.  Will
report back soon.

Just happened me too. I think I see the problem:

When boot() calls sync(), it passes thread0 as the thread argument.
This gets propgated up to ffs_sync, which:

  calls vget(), which takes a thread argument.
  does some stuff
  calls vput(), which does _not_ take a thread argument

The vget() is passed thread0, as passed from boot.
The vput() gets the current thread, which is the process calling boot.

The unlocking in vput is asserting that the same thread that aquired
the lock is releasing it, which seems reasonable.

The obvious solution might be to change line 1161 of ffs_vfsops to
pass vget() curthread rather than td. I assume there's a good
reason why thread0 is passed from boot(), but I can't see why
that's of any use to the vnode locking.

i.e.:
Index: ffs_vfsops.c
===
RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.221
diff -u -r1.221 ffs_vfsops.c
--- ffs_vfsops.c1 Nov 2003 05:51:54 -   1.221
+++ ffs_vfsops.c2 Nov 2003 03:06:42 -
@@ -1158,7 +1158,7 @@
continue;
}
mtx_unlock(mntvnode_mtx);
-   if ((error = vget(vp, lockreq, td)) != 0) {
+   if ((error = vget(vp, lockreq, curthread)) != 0) {
mtx_lock(mntvnode_mtx);
if (error == ENOENT)
goto loop;


How come tha parameters to vget and vput are lopsided like this?

This might have something to do with the commit
of revision 1.218 of ffs_vfsops.c, but I'm not sure.


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread Doug White
For giggles I'm rolling back vfs_default.c back to 1.87 since its along
the backtrace path.

I suspect I'll need to back up the whole thing to before the commit for
the struct mount locking until jeff  kan can straighten things out.

On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote:



 I can confirm the lockmgr panic on shutdown reported by someone else
 earlier (whose message I mistakenly deleted).
 
 It looks like swapper is trying to undo a lock from pagedaemon and runs
 into trouble. This is probably related to the Giant pushdown of
 vm_pageout() that alc did last week.
 
 I'm building with INVARIANTS to see if that will catch more info.  Will
 report back soon.

 Just happened me too. I think I see the problem:

 When boot() calls sync(), it passes thread0 as the thread argument.
 This gets propgated up to ffs_sync, which:

   calls vget(), which takes a thread argument.
   does some stuff
   calls vput(), which does _not_ take a thread argument

 The vget() is passed thread0, as passed from boot.
 The vput() gets the current thread, which is the process calling boot.

 The unlocking in vput is asserting that the same thread that aquired
 the lock is releasing it, which seems reasonable.

 The obvious solution might be to change line 1161 of ffs_vfsops to
 pass vget() curthread rather than td. I assume there's a good
 reason why thread0 is passed from boot(), but I can't see why
 that's of any use to the vnode locking.

 i.e.:
 Index: ffs_vfsops.c
 ===
 RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v
 retrieving revision 1.221
 diff -u -r1.221 ffs_vfsops.c
 --- ffs_vfsops.c1 Nov 2003 05:51:54 -   1.221
 +++ ffs_vfsops.c2 Nov 2003 03:06:42 -
 @@ -1158,7 +1158,7 @@
 continue;
 }
 mtx_unlock(mntvnode_mtx);
 -   if ((error = vget(vp, lockreq, td)) != 0) {
 +   if ((error = vget(vp, lockreq, curthread)) != 0) {
 mtx_lock(mntvnode_mtx);
 if (error == ENOENT)
 goto loop;


 How come tha parameters to vget and vput are lopsided like this?

 This might have something to do with the commit
 of revision 1.218 of ffs_vfsops.c, but I'm not sure.



-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread Doug White
On Sat, 1 Nov 2003, Doug White wrote:

 For giggles I'm rolling back vfs_default.c back to 1.87 since its along
 the backtrace path.

This didn't work so -CURRENT is fully broke.

I'd suggest staying on 10/30 not before 4PM PST if you want to not crash
on shutdown.


 I suspect I'll need to back up the whole thing to before the commit for
 the struct mount locking until jeff  kan can straighten things out.

 On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote:

 
 
  I can confirm the lockmgr panic on shutdown reported by someone else
  earlier (whose message I mistakenly deleted).
  
  It looks like swapper is trying to undo a lock from pagedaemon and runs
  into trouble. This is probably related to the Giant pushdown of
  vm_pageout() that alc did last week.
  
  I'm building with INVARIANTS to see if that will catch more info.  Will
  report back soon.
 
  Just happened me too. I think I see the problem:
 
  When boot() calls sync(), it passes thread0 as the thread argument.
  This gets propgated up to ffs_sync, which:
 
calls vget(), which takes a thread argument.
does some stuff
calls vput(), which does _not_ take a thread argument
 
  The vget() is passed thread0, as passed from boot.
  The vput() gets the current thread, which is the process calling boot.
 
  The unlocking in vput is asserting that the same thread that aquired
  the lock is releasing it, which seems reasonable.
 
  The obvious solution might be to change line 1161 of ffs_vfsops to
  pass vget() curthread rather than td. I assume there's a good
  reason why thread0 is passed from boot(), but I can't see why
  that's of any use to the vnode locking.
 
  i.e.:
  Index: ffs_vfsops.c
  ===
  RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v
  retrieving revision 1.221
  diff -u -r1.221 ffs_vfsops.c
  --- ffs_vfsops.c1 Nov 2003 05:51:54 -   1.221
  +++ ffs_vfsops.c2 Nov 2003 03:06:42 -
  @@ -1158,7 +1158,7 @@
  continue;
  }
  mtx_unlock(mntvnode_mtx);
  -   if ((error = vget(vp, lockreq, td)) != 0) {
  +   if ((error = vget(vp, lockreq, curthread)) != 0) {
  mtx_lock(mntvnode_mtx);
  if (error == ENOENT)
  goto loop;
 
 
  How come tha parameters to vget and vput are lopsided like this?
 
  This might have something to do with the commit
  of revision 1.218 of ffs_vfsops.c, but I'm not sure.
 
 



-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread peter . edwards


 For giggles I'm rolling back vfs_default.c back to 1.87 since its along
 the backtrace path.

This didn't work so -CURRENT is fully broke.

I'd suggest staying on 10/30 not before 4PM PST if you want to not crash
on shutdown.


The patch worked for me. (Well, a slightly modified one: I passed 0 for
the
thread argument to vget: It recognises that as special).

Included here is the patch to both the ffs and default sync operations.
I didn't exercise the default one, but the ffs case is certainly behaving
itself.




Index: kern/vfs_default.c
===
RCS file: /usr/cvs/FreeBSD-CVS/src/sys/kern/vfs_default.c,v
retrieving revision 1.89
diff -u -r1.89 vfs_default.c
--- kern/vfs_default.c  1 Nov 2003 05:51:54 -   1.89
+++ kern/vfs_default.c  2 Nov 2003 03:36:03 -
@@ -898,7 +898,7 @@
}
mtx_unlock(mntvnode_mtx);
 
-   if ((error = vget(vp, lockreq, td)) != 0) {
+   if ((error = vget(vp, lockreq, 0)) != 0) {
mtx_lock(mntvnode_mtx);
if (error == ENOENT)
goto loop;
Index: ufs/ffs/ffs_vfsops.c
===
RCS file: /usr/cvs/FreeBSD-CVS/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.221
diff -u -r1.221 ffs_vfsops.c
--- ufs/ffs/ffs_vfsops.c1 Nov 2003 05:51:54 -   1.221
+++ ufs/ffs/ffs_vfsops.c2 Nov 2003 03:22:13 -
@@ -1158,7 +1158,7 @@
continue;
}
mtx_unlock(mntvnode_mtx);
-   if ((error = vget(vp, lockreq, td)) != 0) {
+   if ((error = vget(vp, lockreq, 0)) != 0) {
mtx_lock(mntvnode_mtx);
if (error == ENOENT)
goto loop;
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread Doug White
On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote:

  For giggles I'm rolling back vfs_default.c back to 1.87 since its along
  the backtrace path.
 
 This didn't work so -CURRENT is fully broke.
 
 I'd suggest staying on 10/30 not before 4PM PST if you want to not crash
 on shutdown.
 

 The patch worked for me. (Well, a slightly modified one: I passed 0 for
 the thread argument to vget: It recognises that as special).

kan came up with a different patch that changes the vput in
ffs_vfsops:ffs_sync with a vrele.  That should be committed shortly. Since
he's been working in that area I'll defer to him :)


 Included here is the patch to both the ffs and default sync operations.
 I didn't exercise the default one, but the ffs case is certainly behaving
 itself.






-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: lockmgr panic on shutdown

2003-11-01 Thread Bruce Evans
On Sun, 2 Nov 2003 [EMAIL PROTECTED] wrote:

 The obvious solution might be to change line 1161 of ffs_vfsops to
 pass vget() curthread rather than td. I assume there's a good
 reason why thread0 is passed from boot(), but I can't see why
 that's of any use to the vnode locking.

Passing thread0 in boot() is a quick (and not even wrong) fix for
the problem that there is no valid current process^Wthread in the
panic case.  Long ago in Net/2 (still in Lite2 for at least the
i386 version), sync() in boot() was passed the completely bogus
parameters ((struct sigcontext *)0) (instead of (p, uap, retval).
This worked to the extent that sync()'s proc pointer was not passed
further or not dereferenced.  Now there are lots of locks, and since
thread0 is never the corerect lock holder, things work at most to
the extent that sync()'s proc pointer is not passed further.
curthread is never null in -current, so upgrading to the version that
passes it (i386/i386/machdep.c 1.111 (actually passes curproc)) would
probably help in the non-panic case without increasing bugs for the
panic case.  However, passing curthread is still wrong for the panic
case due to the following complications:
- panics may occur during context switches or in other critical regions
  when curthread is not quite current.
- under SMP, curthread is per-CPU, so having it non-null doesn't really
  help.  Locks may be held by curproc's running on other CPUs, and in
  panic() it is difficult to handle the other CPUs correctly -- if you
  stop them then they won't be able to release their locks, and if you
  let them run they may run into you.  Hopefully in the case of a
  normal shutdown all the other CPUs release their locks and stop before
  the sync().

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]