5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepable locks

2003-06-16 Thread Chris Shenton
(I don't know if this has any relation to the problems I reported
yesterday with qmail-send consuming 100% cpu after 5.0 to 5.1 upgrade.)

After booting 5.1-CURRENT the system runs fine for a while.  Then
later most disk i/o related actions seem to hang.  E.g., system works
but when cron kicks off a glimpseindex in the middle of the night, the
system is useless by the morning.  If I login on the console as me, it
takes my username and password then hangs (trying to run
/usr/local/bin/bash?). If I do this as root, I do get a shell
(/bin/csh).  After a point, asking for top will hang, even as root.
Even a reboot hung this morning with nothing in the logs.

The system has become almost unusable because of this, requiring
frequent reboots or hardware resets.

Sometimes when I do something as simple as ps I see this ominous
message on the console:

  sysctl_old_user() with the following non-sleepablelocks held:
  exclusive sleep mutex process lock r = 0 (0xc50bc9e0) locked @ 
/usr/src/sys/kern/kern_proc.c:258

which gets into /var/log/messages as:

  Jun 16 08:33:48 PECTOPAH kernel: exclusive sleep mutex process lock r = 0 
(0xc50c7618) locked @ /usr/src/sys/kern/kern_proc.c:258

There are a bunch of these.

That file is version:

  $FreeBSD: src/sys/kern/kern_proc.c,v 1.189 2003/06/14 06:20:25 alc Exp $

and the line is the PROC_LOCK() portion of:

  struct proc *
  pfind(pid)
  register pid_t pid;
  {
  register struct proc *p;

  sx_slock(allproc_lock);
  LIST_FOREACH(p, PIDHASH(pid), p_hash)
  if (p-p_pid == pid) {
  PROC_LOCK(p);
  break;
  }
  sx_sunlock(allproc_lock);
  return (p);
  }

Any thoughts? Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user()non-sleepable locks

2003-06-16 Thread Don Lewis
On 16 Jun, Chris Shenton wrote:
 (I don't know if this has any relation to the problems I reported
 yesterday with qmail-send consuming 100% cpu after 5.0 to 5.1 upgrade.)

I doubt it.  I checked in a fix for this problem today so you should get
the fix when you next cvsup.

 After booting 5.1-CURRENT the system runs fine for a while.  Then
 later most disk i/o related actions seem to hang.  E.g., system works
 but when cron kicks off a glimpseindex in the middle of the night, the
 system is useless by the morning.  If I login on the console as me, it
 takes my username and password then hangs (trying to run
 /usr/local/bin/bash?). If I do this as root, I do get a shell
 (/bin/csh).  After a point, asking for top will hang, even as root.
 Even a reboot hung this morning with nothing in the logs.

Can you break into ddb and do a ps to find out what state all the
processes are in?  You might want to try adding the DEBUG_VFS_LOCKS
options to your kernel config to see if that turns up anything.  There
is also ddb command to list the locked vnodes show lockedvnods.

Are you using nullfs or unionfs which are a bit fragile?

 The system has become almost unusable because of this, requiring
 frequent reboots or hardware resets.
 
 Sometimes when I do something as simple as ps I see this ominous
 message on the console:
 
   sysctl_old_user() with the following non-sleepablelocks held:
   exclusive sleep mutex process lock r = 0 (0xc50bc9e0) locked @ 
 /usr/src/sys/kern/kern_proc.c:258
 
 which gets into /var/log/messages as:
 
   Jun 16 08:33:48 PECTOPAH kernel: exclusive sleep mutex process lock r = 0 
 (0xc50c7618) locked @ /usr/src/sys/kern/kern_proc.c:258
 
 There are a bunch of these.

I've been seeing this for about the last week, I think.  It seems to be
harmless and nothing bad has happened to my -current box.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]