Re: backgroud fsck is still locking up system (fwd)

2002-12-17 Thread Kirk McKusick
Date: Mon, 9 Dec 2002 11:19:13 -0800
From: Brooks Davis [EMAIL PROTECTED]
To: Kirk McKusick [EMAIL PROTECTED]
Cc: Brooks Davis [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED],
   Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)

On Fri, Dec 06, 2002 at 05:52:38PM -0800, Kirk McKusick wrote:
 Adding a two minute delay before starting background fsck
 sounds like a very good idea to me. Please send me your
 suggested change.

Here it is.  As written it doesn't add the delay, but you can change
etc/defaults/rc.conf to do that it desired.

-- Brooks

I have added your suggested change to -current (6.0). I decided to
set the default startup delay to sixty seconds as that seems to be
enough time to let the initial system startup settle down. If this
change proves to be popular, it can be considered for MFC'ing to 5.0.

Kirk McKusick

=-=-=-=-=-=

From: Kirk McKusick [EMAIL PROTECTED]
Date: Tue, 17 Dec 2002 23:21:31 -0800 (PST)
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: cvs commit: src/etc rc src/etc/defaults rc.conf src/etc/rc.d
 bgfsck src/share/man/man5 rc.conf.5
X-FreeBSD-CVS-Branch: HEAD

mckusick2002/12/17 23:21:31 PST

  Modified files:
etc  rc 
etc/defaults rc.conf 
etc/rc.d bgfsck 
share/man/man5   rc.conf.5 
  Log:
  Delay an optional amount of time after booting before starting a
  background fsck. The delay defaults to sixty seconds to allow
  large applications such as the X server to start before disk I/O
  bandwidth is monopolized by fsck.
  
  Submitted by:   Brooks Davis [EMAIL PROTECTED]
  Sponsored by:   DARPA  NAI Labs.
  
  Revision  ChangesPath
  1.165 +1 -0  src/etc/defaults/rc.conf
  1.324 +8 -2  src/etc/rc
  1.3   +13 -2 src/etc/rc.d/bgfsck
  1.168 +5 -0  src/share/man/man5/rc.conf.5

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-09 Thread Brooks Davis
On Fri, Dec 06, 2002 at 05:52:38PM -0800, Kirk McKusick wrote:
 Adding a two minute delay before starting background fsck
 sounds like a very good idea to me. Please send me your
 suggested change.

Here it is.  As written it doesn't add the delay, but you can change
etc/defaults/rc.conf to do that it desired.

-- Brooks

-- 
Any statement of the form X is the one, true Y is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

Index: etc/rc
===
RCS file: /usr/cvs/src/etc/rc,v
retrieving revision 1.323
diff -u -p -r1.323 rc
--- etc/rc  26 Nov 2002 17:51:03 -  1.323
+++ etc/rc  4 Dec 2002 23:08:41 -
@@ -982,8 +982,14 @@ esac
 # Start background fsck checks if necessary
 case ${background_fsck} in
 [Yy][Ee][Ss])
-   echo 'Starting background filesystem checks'
-   nice -4 fsck -B -p 21 | logger -p daemon.notice 
+   bgfsck_msg='Starting background file system checks'
+   if [ ${background_fsck_delay:=0} -gt 0 ]; then
+   bgfsck_msg=${bgfsck_msg} in ${background_fsck_delay} seconds
+   fi
+   echo ${bgfsck_msg}.
+
+   (sleep ${background_fsck_delay}; nice -4 fsck -B -p) 21 | \
+   logger -p daemon.notice 
;;
 esac
 
Index: etc/defaults/rc.conf
===
RCS file: /usr/cvs/src/etc/defaults/rc.conf,v
retrieving revision 1.164
diff -u -p -r1.164 rc.conf
--- etc/defaults/rc.conf6 Dec 2002 05:23:37 -   1.164
+++ etc/defaults/rc.conf6 Dec 2002 18:02:18 -
@@ -40,6 +40,7 @@ script_name_sep=# Change if your sta
 rc_conf_files=/etc/rc.conf /etc/rc.conf.local
 fsck_y_enable=NO # Set to YES to do fsck -y if the initial preen fails.
 background_fsck=YES  # Attempt to run fsck in the background where possible.
+background_fsck_delay=0 # Time to wait (seconds) before starting the fsck.
 extra_netfs_types=NO # List of network extra filesystem types for delayed
# mount at startup (or NO).
 
Index: etc/rc.d/bgfsck
===
RCS file: /usr/cvs/src/etc/rc.d/bgfsck,v
retrieving revision 1.2
diff -u -p -r1.2 bgfsck
--- etc/rc.d/bgfsck 28 Jul 2002 03:38:10 -  1.2
+++ etc/rc.d/bgfsck 9 Oct 2002 23:31:45 -
@@ -11,9 +11,20 @@
 
 name=background-fsck
 rcvar=background_fsck
-start_precmd=echo 'Starting background file system checks.'
-start_cmd=nice -4 fsck -B -p 21 | logger -p daemon.notice 
+start_cmd=bgfsck_start
 stop_cmd=:
+
+bgfsck_start ()
+{
+   bgfsck_msg='Starting background file system checks'
+   if [ ${background_fsck_delay:=0} -gt 0 ]; then
+   bgfsck_msg=${bgfsck_msg} in ${background_fsck_delay} seconds
+   fi
+   echo ${bgfsck_msg}.
+
+   (sleep ${background_fsck_delay}; nice -4 fsck -B -p) 21 | \
+   logger -p daemon.notice 
+}
 
 load_rc_config $name
 run_rc_command $1
Index: share/man/man5/rc.conf.5
===
RCS file: /usr/cvs/src/share/man/man5/rc.conf.5,v
retrieving revision 1.166
diff -u -p -r1.166 rc.conf.5
--- share/man/man5/rc.conf.529 Nov 2002 11:39:19 -  1.166
+++ share/man/man5/rc.conf.54 Dec 2002 23:11:53 -
@@ -734,6 +734,11 @@ If set to
 the system will attempt to run
 .Xr fsck 8
 in the background where possible.
+.It Va background_fsck_delay
+.Pq Vt int
+The amount of time in seconds to sleep before starting a background fsck.
+Setting this to a non-zero number may allow large applications such as
+the X server to start before disk I/O bandwidth is monopolized by fsck.
 .It Va extra_netfs_types
 .Pq Vt str
 If set to something other than



msg48419/pgp0.pgp
Description: PGP signature


Re: backgroud fsck is still locking up system (fwd)

2002-12-08 Thread Archie Cobbs
Julian Elischer wrote:
  Well, I suspected that it might not work... but I would disagree that it
  was *obvious* that it would not work. This was before mount had been
  run, so / was supposedly mounted (?) read-only.
 
 I've seen ufs write back the superblock on unmounting a read-only
 filesystem (!). it was a few years ago but I wouldn;t be surprised if it
 was still true..
 
 After you did it on the filesystem. (ran growfs) what did you do next?
 the safe answer would be to pull the plug.

reboot

It seems counter-intuitive that a filesystem mounted read only
would be modified by the kernel. I'm sure there's some subtlety
I'm not aware of though..

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-08 Thread Archie Cobbs
Bruce Evans wrote:
   Er, it should be obvious that growfs can't reasonably work on the mounted
   partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
   in a general form:
  
  Currently growfs can only enlarge unmounted file systems.  Do not
try enlarging a mounted file system, your system may panic and you will
not be able to use the file system any longer...
 
  Well, I suspected that it might not work... but I would disagree that it
  was *obvious* that it would not work. This was before mount had been
  run, so / was supposedly mounted (?) read-only.
 
 Perhaps the unobvious point is that fsck could work.  If the mount is r/w,
 then neither growfs nor fsck can even open the partition r/w.  fsck somehow
 works in the case of a r/o root, but growfs apparently doesn't.  I think
 fsck depends on no other processes making (significant) vfs syscalls for
 on the same partition while it is running (even r/o ones might be harmful
 if they caused reads of metadata which might be inconsistent).  Then when
 fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata.
 growfs doesn't do this, and even if it did it is not clear that it does
 all the necessary syncing (growfs may change more or different metadata).
 However, I think it does most of the necessary things.

FYI, I submitted a bug/enhancement request to summarize this..

  http://www.freebsd.org/cgi/query-pr.cgi?pr=46110

-Archie

P.S. Why does submitting a bug now generate an email response from
 (and who the heck is) ThinkHost Support ??

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-08 Thread Kirk McKusick
Date: Sat, 7 Dec 2002 11:07:23 -0800 (PST)
From: Nate Lawson [EMAIL PROTECTED]
To: Archie Cobbs [EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
X-ASK-Info: Whitelist match

On Fri, 6 Dec 2002, Archie Cobbs wrote:
 Julian Elischer wrote:
  I put a copy of / in /usr
  then from the fixit, I mounted /usr as / and ran growfs from there..
  the trick is to not do it while / is mounted.
 
 / wasn't mounted yet when I ran growfs:
 
   I ran growfs after booting single user mode but before mounting
   any disks.. perhaps that caused it to not work.
 
 But it was the root partition and I was running in single user mode.
 If that's a problem then the growfs man page should say so, or maybe
 it should be more clear about what is meant by mounted.

growfs won't work with any mounted fs (even ro) because it needs to
quiesce kenrel file ops and you can't do that from usermode (yet).  I
wonder if there might be some clever way to abuse snapshots to have this
same effect (i.e. keep an open handle to the underlying fs cdev for growfs
to use and then mount a snapshot of the fs over its own mountpoint for
procs to use.)
 
 In any case, running it from the fixit floppy didn't work either
 (got a core dump), but that may be because it was already screwed up.
 
 So at minimum, there's a documentation bug (IMHO).

I assume the superblock changes between 4 and 5 changed the ability to use
4.x growfs on 5.x ufs partitions.  Also, does growfs need to be updated
for ufs2?

-Nate

I have made the structural changes to growfs to make it work for
UFS2, however, I have not done more than cursory testing. I would
appreciate it if someone could try running it on various UFS2
filesystems to see if it works properly.

Kirk McKusick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-08 Thread Kirk McKusick
In theory the MNT_RELOAD command should reload all the filesystem
metadata properly though this feature has not been tested with
growfs. If anyone has the time to try it out and report back any
problems, that would be appreciated.

Kirk McKusick

=-=-=-=-=

From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Bruce Evans [EMAIL PROTECTED]
Date: Sun, 8 Dec 2002 17:03:43 -0800 (PST)
CC: Archie Cobbs [EMAIL PROTECTED],
   Kirk McKusick [EMAIL PROTECTED],
   Julian Elischer [EMAIL PROTECTED], [EMAIL PROTECTED],
   Thomas-Henning von Kamptz [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Bruce Evans wrote:
   Er, it should be obvious that growfs can't reasonably work on the mounted
   partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
   in a general form:
  
  Currently growfs can only enlarge unmounted file systems.  Do not
try enlarging a mounted file system, your system may panic and you will
not be able to use the file system any longer...
 
  Well, I suspected that it might not work... but I would disagree that it
  was *obvious* that it would not work. This was before mount had been
  run, so / was supposedly mounted (?) read-only.
 
 Perhaps the unobvious point is that fsck could work.  If the mount is r/w,
 then neither growfs nor fsck can even open the partition r/w.  fsck somehow
 works in the case of a r/o root, but growfs apparently doesn't.  I think
 fsck depends on no other processes making (significant) vfs syscalls for
 on the same partition while it is running (even r/o ones might be harmful
 if they caused reads of metadata which might be inconsistent).  Then when
 fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata.
 growfs doesn't do this, and even if it did it is not clear that it does
 all the necessary syncing (growfs may change more or different metadata).
 However, I think it does most of the necessary things.

FYI, I submitted a bug/enhancement request to summarize this..

  http://www.freebsd.org/cgi/query-pr.cgi?pr=46110

-Archie

P.S. Why does submitting a bug now generate an email response from
 (and who the heck is) ThinkHost Support ??

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Bruce Evans
On Fri, 6 Dec 2002, Archie Cobbs wrote:

 So in summary my recommendation is to add a big warning to the
 growfs(1) man page that is should not be run on the root partition,
 even if you have booted single-user mode and haven't mounted / yet.
 I.e., to grow a root partition, you must boot from a different partition.

Er, it should be obvious that growfs can't reasonably work on the mounted
partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
in a general form:

   Currently growfs can only enlarge unmounted file systems.  Do not
 try enlarging a mounted file system, your system may panic and you will
 not be able to use the file system any longer...

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Robert Watson

On Sun, 8 Dec 2002, Bruce Evans wrote:

 On Fri, 6 Dec 2002, Archie Cobbs wrote:
 
  So in summary my recommendation is to add a big warning to the
  growfs(1) man page that is should not be run on the root partition,
  even if you have booted single-user mode and haven't mounted / yet.
  I.e., to grow a root partition, you must boot from a different partition.
 
 Er, it should be obvious that growfs can't reasonably work on the mounted
 partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
 in a general form:
 
    Currently growfs can only enlarge unmounted file systems.  Do not
  try enlarging a mounted file system, your system may panic and you will
  not be able to use the file system any longer...

Hmm.  I guess one of the interesting questions is: what happened to the
safety belts?  I would have thought that GEOM would prevent opening the
partition writable while it was mounted...

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Ian Dowse
In message [EMAIL PROTECTED], Kirk McKusick wr
ites:
Adding a two minute delay before starting background fsck
sounds like a very good idea to me. Please send me your
suggested change.

BTW, I've been using a fsck_ffs modificaton for a while now that
does something like the disabled kernel I/O slowdown, but from
userland. It seems to help quite a lot in leaving some disk bandwidth
for other processes. Waiting a while before starting the fsck seems
like a good idea anyway though. Patch below (I think I posted an
earlier version of this before).

Ian

Index: fsutil.c
===
RCS file: /dump/FreeBSD-CVS/src/sbin/fsck_ffs/fsutil.c,v
retrieving revision 1.19
diff -u -r1.19 fsutil.c
--- fsutil.c27 Nov 2002 02:18:57 -  1.19
+++ fsutil.c4 Dec 2002 02:16:28 -
@@ -40,6 +40,7 @@
 #endif /* not lint */
 
 #include sys/param.h
+#include sys/time.h
 #include sys/types.h
 #include sys/sysctl.h
 #include sys/disklabel.h
@@ -62,7 +63,13 @@
 
 #include fsck.h
 
+static void slowio_start(void);
+static void slowio_end(void);
+
 long   diskreads, totalreads;  /* Disk cache statistics */
+struct timeval slowio_starttime;
+int slowio_delay_usec = 1; /* Initial IO delay for background fsck */
+int slowio_pollcnt;
 
 int
 ftypeok(union dinode *dp)
@@ -350,10 +357,15 @@
 
offset = blk;
offset *= dev_bsize;
+   if (bkgrdflag)
+   slowio_start();
if (lseek(fd, offset, 0)  0)
rwerror(SEEK BLK, blk);
-   else if (read(fd, buf, (int)size) == size)
+   else if (read(fd, buf, (int)size) == size) {
+   if (bkgrdflag)
+   slowio_end();
return (0);
+   }
rwerror(READ BLK, blk);
if (lseek(fd, offset, 0)  0)
rwerror(SEEK BLK, blk);
@@ -463,6 +475,39 @@
idesc.id_blkno = blkno;
idesc.id_numfrags = frags;
(void)pass4check(idesc);
+}
+
+/* Slow down IO so as to leave some disk bandwidth for other processes */
+void
+slowio_start()
+{
+
+   /* Delay one in every 8 operations by 16 times the average IO delay */
+   slowio_pollcnt = (slowio_pollcnt + 1)  7;
+   if (slowio_pollcnt == 0) {
+   usleep(slowio_delay_usec * 16);
+   gettimeofday(slowio_starttime, NULL);
+   }
+}
+
+void
+slowio_end()
+{
+   struct timeval tv;
+   int delay_usec;
+
+   if (slowio_pollcnt != 0)
+   return;
+
+   /* Update the slowdown interval. */
+   gettimeofday(tv, NULL);
+   delay_usec = (tv.tv_sec - slowio_starttime.tv_sec) * 100 +
+   (tv.tv_usec - slowio_starttime.tv_usec);
+   if (delay_usec  64)
+   delay_usec = 64;
+   if (delay_usec  100)
+   delay_usec = 100;
+   slowio_delay_usec = (slowio_delay_usec * 63 + delay_usec)  6;
 }
 
 /*

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Kirk McKusick
Thanks for reminding me about your userland change to background
fsck. I have tried it out and concur that it is the right approach
until we manage to get the general solution in the kernel. I
suggest that you propose it to release engineering and if approved
check it in.

Kirk McKusick

=-=-=-=-=-=

To: Kirk McKusick [EMAIL PROTECTED]
cc: Brooks Davis [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED],
   Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd) 
In-Reply-To: Your message of Fri, 06 Dec 2002 17:52:38 PST.
 [EMAIL PROTECTED] 
Date: Sat, 07 Dec 2002 14:26:39 +
From: Ian Dowse [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

In message [EMAIL PROTECTED], Kirk McKusick wr
ites:
Adding a two minute delay before starting background fsck
sounds like a very good idea to me. Please send me your
suggested change.

BTW, I've been using a fsck_ffs modificaton for a while now that
does something like the disabled kernel I/O slowdown, but from
userland. It seems to help quite a lot in leaving some disk bandwidth
for other processes. Waiting a while before starting the fsck seems
like a good idea anyway though. Patch below (I think I posted an
earlier version of this before).

Ian

Index: fsutil.c
===
RCS file: /dump/FreeBSD-CVS/src/sbin/fsck_ffs/fsutil.c,v
retrieving revision 1.19
diff -u -r1.19 fsutil.c
--- fsutil.c27 Nov 2002 02:18:57 -  1.19
+++ fsutil.c4 Dec 2002 02:16:28 -
@@ -40,6 +40,7 @@
 #endif /* not lint */
 
 #include sys/param.h
+#include sys/time.h
 #include sys/types.h
 #include sys/sysctl.h
 #include sys/disklabel.h
@@ -62,7 +63,13 @@
 
 #include fsck.h
 
+static void slowio_start(void);
+static void slowio_end(void);
+
 long   diskreads, totalreads;  /* Disk cache statistics */
+struct timeval slowio_starttime;
+int slowio_delay_usec = 1; /* Initial IO delay for background fsck */
+int slowio_pollcnt;
 
 int
 ftypeok(union dinode *dp)
@@ -350,10 +357,15 @@
 
offset = blk;
offset *= dev_bsize;
+   if (bkgrdflag)
+   slowio_start();
if (lseek(fd, offset, 0)  0)
rwerror(SEEK BLK, blk);
-   else if (read(fd, buf, (int)size) == size)
+   else if (read(fd, buf, (int)size) == size) {
+   if (bkgrdflag)
+   slowio_end();
return (0);
+   }
rwerror(READ BLK, blk);
if (lseek(fd, offset, 0)  0)
rwerror(SEEK BLK, blk);
@@ -463,6 +475,39 @@
idesc.id_blkno = blkno;
idesc.id_numfrags = frags;
(void)pass4check(idesc);
+}
+
+/* Slow down IO so as to leave some disk bandwidth for other processes */
+void
+slowio_start()
+{
+
+   /* Delay one in every 8 operations by 16 times the average IO delay */
+   slowio_pollcnt = (slowio_pollcnt + 1)  7;
+   if (slowio_pollcnt == 0) {
+   usleep(slowio_delay_usec * 16);
+   gettimeofday(slowio_starttime, NULL);
+   }
+}
+
+void
+slowio_end()
+{
+   struct timeval tv;
+   int delay_usec;
+
+   if (slowio_pollcnt != 0)
+   return;
+
+   /* Update the slowdown interval. */
+   gettimeofday(tv, NULL);
+   delay_usec = (tv.tv_sec - slowio_starttime.tv_sec) * 100 +
+   (tv.tv_usec - slowio_starttime.tv_usec);
+   if (delay_usec  64)
+   delay_usec = 64;
+   if (delay_usec  100)
+   delay_usec = 100;
+   slowio_delay_usec = (slowio_delay_usec * 63 + delay_usec)  6;
 }
 
 /*

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Nate Lawson
On Fri, 6 Dec 2002, Archie Cobbs wrote:
 Julian Elischer wrote:
  I put a copy of / in /usr
  then from the fixit, I mounted /usr as / and ran growfs from there..
  the trick is to not do it while / is mounted.
 
 / wasn't mounted yet when I ran growfs:
 
   I ran growfs after booting single user mode but before mounting
   any disks.. perhaps that caused it to not work.
 
 But it was the root partition and I was running in single user mode.
 If that's a problem then the growfs man page should say so, or maybe
 it should be more clear about what is meant by mounted.

growfs won't work with any mounted fs (even ro) because it needs to
quiesce kenrel file ops and you can't do that from usermode (yet).  I
wonder if there might be some clever way to abuse snapshots to have this
same effect (i.e. keep an open handle to the underlying fs cdev for growfs
to use and then mount a snapshot of the fs over its own mountpoint for
procs to use.)
 
 In any case, running it from the fixit floppy didn't work either
 (got a core dump), but that may be because it was already screwed up.
 
 So at minimum, there's a documentation bug (IMHO).

I assume the superblock changes between 4 and 5 changed the ability to use
4.x growfs on 5.x ufs partitions.  Also, does growfs need to be updated
for ufs2?

-Nate



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Julian Elischer


On Sat, 7 Dec 2002, Archie Cobbs wrote:

 Bruce Evans wrote:
   So in summary my recommendation is to add a big warning to the
   growfs(1) man page that is should not be run on the root partition,
   even if you have booted single-user mode and haven't mounted / yet.
   I.e., to grow a root partition, you must boot from a different partition.
  
  Er, it should be obvious that growfs can't reasonably work on the mounted
  partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
  in a general form:
  
     Currently growfs can only enlarge unmounted file systems.  Do not
   try enlarging a mounted file system, your system may panic and you will
   not be able to use the file system any longer...
 
 Well, I suspected that it might not work... but I would disagree that it
 was *obvious* that it would not work. This was before mount had been
 run, so / was supposedly mounted (?) read-only.

I've seen ufs write back the superblock on unmounting a read-only
filesystem (!). it was a few years ago but I wouldn;t be surprised if it
was still true..

After you did it on the filesystem. (ran growfs) what did you do next?
the safe answer would be to pull the plug.

 
 In any case, when you're talking about the danger of destroying a
 filesystem it probably wouldn't hurt to have a little extra clarity
 in the documentation.
 
 Or better yet, should the kernel prevent raw writes to the / partition?
 Guess that would prevent fsck from working though.
 
 -Archie
 
 __
 Archie Cobbs * Packet Design * http://www.packetdesign.com
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Archie Cobbs
Bruce Evans wrote:
  So in summary my recommendation is to add a big warning to the
  growfs(1) man page that is should not be run on the root partition,
  even if you have booted single-user mode and haven't mounted / yet.
  I.e., to grow a root partition, you must boot from a different partition.
 
 Er, it should be obvious that growfs can't reasonably work on the mounted
 partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
 in a general form:
 
    Currently growfs can only enlarge unmounted file systems.  Do not
  try enlarging a mounted file system, your system may panic and you will
  not be able to use the file system any longer...

Well, I suspected that it might not work... but I would disagree that it
was *obvious* that it would not work. This was before mount had been
run, so / was supposedly mounted (?) read-only.

In any case, when you're talking about the danger of destroying a
filesystem it probably wouldn't hurt to have a little extra clarity
in the documentation.

Or better yet, should the kernel prevent raw writes to the / partition?
Guess that would prevent fsck from working though.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Bruce Evans
On Sat, 7 Dec 2002, Robert Watson wrote:

 On Sun, 8 Dec 2002, Bruce Evans wrote:
  Er, it should be obvious that growfs can't reasonably work on the mounted
  partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
  ...

 Hmm.  I guess one of the interesting questions is: what happened to the
 safety belts?  I would have thought that GEOM would prevent opening the
 partition writable while it was mounted...

The kernel doesn't and shouldn't prevent it for the r/o-mounted case
(since fsck needs to write to the partition of a mounted file system
for at least the case of the root file system mounted r/o), and
apparently growfs doesn't prevent it in ths case either.  There are
lots of safety belts in the kernel for the r/w-mounted case.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-07 Thread Bruce Evans
On Sat, 7 Dec 2002, Archie Cobbs wrote:

 Bruce Evans wrote:
  Er, it should be obvious that growfs can't reasonably work on the mounted
  partitions.  growfs.1 doesn't exist, but growfs.8 already has the warning
  in a general form:
 
     Currently growfs can only enlarge unmounted file systems.  Do not
   try enlarging a mounted file system, your system may panic and you will
   not be able to use the file system any longer...

 Well, I suspected that it might not work... but I would disagree that it
 was *obvious* that it would not work. This was before mount had been
 run, so / was supposedly mounted (?) read-only.

Perhaps the unobvious point is that fsck could work.  If the mount is r/w,
then neither growfs nor fsck can even open the partition r/w.  fsck somehow
works in the case of a r/o root, but growfs apparently doesn't.  I think
fsck depends on no other processes making (significant) vfs syscalls for
on the same partition while it is running (even r/o ones might be harmful
if they caused reads of metadata which might be inconsistent).  Then when
fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata.
growfs doesn't do this, and even if it did it is not clear that it does
all the necessary syncing (growfs may change more or different metadata).
However, I think it does most of the necessary things.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Nate Lawson
On Thu, 5 Dec 2002, Kirk McKusick wrote:
 Does the background fsck process continue to run, or does the whole
 system come to a halt? If the fsck process continues to run, what 
 happens when it eventually finishes? Is the system still dead, or 
 does it come back to life? If the system does not come back to life
 can you get me the output of `ps axl'? If not, can you break into
 the debugger and get a ps output? (You will need to have the DDB
 option specified in your config file).

Sorry for butting in.  I think Archie is referring to bg fsck gaining an
unfair share of cpu due to it running due to IO completions.  Last I
heard, we were waiting until after 5.0 to experiment with scheduler
changes to make it more fair.  I have not seen any hard locks or other
problems with bg fsck after your commit.

-Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Brooks Davis
On Fri, Dec 06, 2002 at 10:27:10AM -0800, Nate Lawson wrote:
 On Thu, 5 Dec 2002, Kirk McKusick wrote:
  Does the background fsck process continue to run, or does the whole
  system come to a halt? If the fsck process continues to run, what 
  happens when it eventually finishes? Is the system still dead, or 
  does it come back to life? If the system does not come back to life
  can you get me the output of `ps axl'? If not, can you break into
  the debugger and get a ps output? (You will need to have the DDB
  option specified in your config file).
 
 Sorry for butting in.  I think Archie is referring to bg fsck gaining an
 unfair share of cpu due to it running due to IO completions.  Last I
 heard, we were waiting until after 5.0 to experiment with scheduler
 changes to make it more fair.  I have not seen any hard locks or other
 problems with bg fsck after your commit.

My experience is that, at least with my laptop (which has a very slow
disk), bg fsck works OK, but starting applictions for the first time
while fsck is running is _very_ painful.  Even getty seems to have a
hard time.  I've found that adding a two minute delay before the fsck is
sufficent to allow the system to finish starting up and for me to load X
and my main applictions which lets me work while bg fsck is running.  I
posted a patch to add an optional delay in the rc scripts a while ago,
but Kirk was going to re-enable the priority stuff soon so I didn't
persue it.  If there's intrest, I'll regenerate it and repost it.

-- Brooks

-- 
Any statement of the form X is the one, true Y is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4



msg48238/pgp0.pgp
Description: PGP signature


Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Kirk McKusick wrote:
 Does the background fsck process continue to run, or does the whole
 system come to a halt? If the fsck process continues to run, what 
 happens when it eventually finishes? Is the system still dead, or 
 does it come back to life? If the system does not come back to life
 can you get me the output of `ps axl'? If not, can you break into
 the debugger and get a ps output? (You will need to have the DDB
 option specified in your config file).

OK, here is some more info..

I easily reproduced the problem again. So far it's 100% reproducible.
This time to reproduce it simply booted in single user mode, typed
mount -a -t nonfs and then pulled the plug.

After the reboot, the HDD light soon stops blinking altogether. I
waited for several minutes (which should have been long enough) and
it never came back to life, which is not surprising considering
there's no disk activity.

Breaking into the debugger still works. However, pressing the soft
power button no longer causes a graceful shutdown as it normally does.

To copy the 'ps' debugger output, I'd have to manually copy it all,
so here are just a few highlights:

ProcState
-
fsck_ufs0004000 norm[SLPQ nbufbs c036e5b0][SLP]
fsck0004002 norm[SLPQ   wait c124dce8][SLP]
syncer  204 norm[SLPQ nbufbs c036e5b0][SLP]
vnlru   204 norm[SLPQ vlruwt c12c0ce8][SLP]
bufdaemon   204 norm[SLPQ qsleep c036e5a4][SLP]
swapper 200 norm[SLPQ  sched c0315a20][SLP]

Softupdates is enabled on /usr and /var but not /.

This machine also acts as an NFS client for /home/archie.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread David Rhodus

On Friday, December 6, 2002, at 01:39 PM, Archie Cobbs wrote:


Kirk McKusick wrote:

Does the background fsck process continue to run, or does the whole
system come to a halt? If the fsck process continues to run, what
happens when it eventually finishes? Is the system still dead, or
does it come back to life? If the system does not come back to life
can you get me the output of `ps axl'? If not, can you break into
the debugger and get a ps output? (You will need to have the DDB
option specified in your config file).


OK, here is some more info..

I easily reproduced the problem again. So far it's 100% reproducible.
This time to reproduce it simply booted in single user mode, typed
mount -a -t nonfs and then pulled the plug.

After the reboot, the HDD light soon stops blinking altogether. I
waited for several minutes (which should have been long enough) and
it never came back to life, which is not surprising considering
there's no disk activity.

Breaking into the debugger still works. However, pressing the soft
power button no longer causes a graceful shutdown as it normally does.

To copy the 'ps' debugger output, I'd have to manually copy it all,
so here are just a few highlights:

Proc		State
		-
fsck_ufs	0004000 norm[SLPQ nbufbs c036e5b0][SLP]
fsck		0004002 norm[SLPQ   wait c124dce8][SLP]
syncer		204 norm[SLPQ nbufbs c036e5b0][SLP]
vnlru		204 norm[SLPQ vlruwt c12c0ce8][SLP]
bufdaemon	204 norm[SLPQ qsleep c036e5a4][SLP]
swapper		200 norm[SLPQ  sched c0315a20][SLP]

Softupdates is enabled on /usr and /var but not /.

This machine also acts as an NFS client for /home/archie.



Why does softupdates not get enabled on / , by default on the install?

-DR


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Ruslan Ermilov
On Fri, Dec 06, 2002 at 01:52:11PM -0500, David Rhodus wrote:
 
 On Friday, December 6, 2002, at 01:39 PM, Archie Cobbs wrote:
 
 Kirk McKusick wrote:
 Does the background fsck process continue to run, or does the whole
 system come to a halt? If the fsck process continues to run, what
 happens when it eventually finishes? Is the system still dead, or
 does it come back to life? If the system does not come back to life
 can you get me the output of `ps axl'? If not, can you break into
 the debugger and get a ps output? (You will need to have the DDB
 option specified in your config file).
 
 OK, here is some more info..
 
 I easily reproduced the problem again. So far it's 100% reproducible.
 This time to reproduce it simply booted in single user mode, typed
 mount -a -t nonfs and then pulled the plug.
 
 After the reboot, the HDD light soon stops blinking altogether. I
 waited for several minutes (which should have been long enough) and
 it never came back to life, which is not surprising considering
 there's no disk activity.
 
 Breaking into the debugger still works. However, pressing the soft
 power button no longer causes a graceful shutdown as it normally does.
 
 To copy the 'ps' debugger output, I'd have to manually copy it all,
 so here are just a few highlights:
 
 Proc State
  -
 fsck_ufs 0004000 norm[SLPQ nbufbs c036e5b0][SLP]
 fsck 0004002 norm[SLPQ   wait c124dce8][SLP]
 syncer   204 norm[SLPQ nbufbs c036e5b0][SLP]
 vnlru204 norm[SLPQ vlruwt c12c0ce8][SLP]
 bufdaemon204 norm[SLPQ qsleep c036e5a4][SLP]
 swapper  200 norm[SLPQ  sched c0315a20][SLP]
 
 Softupdates is enabled on /usr and /var but not /.
 
 This machine also acts as an NFS client for /home/archie.
 
 
 Why does softupdates not get enabled on / , by default on the install?
 
Read tuning(7).


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg48243/pgp0.pgp
Description: PGP signature


Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
David Rhodus wrote:
  Softupdates is enabled on /usr and /var but not /.
 
 Why does softupdates not get enabled on / , by default on the install?

I disabled softupdates on / back when having it enabled caused disk
full problems during 'make installworld,' and never re-enabled it.

FYI at this point my 50MB / partition is woefully inadequate. I can't
even 'make install kernel' without first removing all existing modules,
and even so / ends up 106% full.

Finally, one more bit of info: I have WITNESS enabled in this kernel
and get this message during boot:

/usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from 
/usr/src/sys/pci/if_dc.c:691

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Nate Lawson wrote:
  Does the background fsck process continue to run, or does the whole
  system come to a halt? If the fsck process continues to run, what 
  happens when it eventually finishes? Is the system still dead, or 
  does it come back to life? If the system does not come back to life
  can you get me the output of `ps axl'? If not, can you break into
  the debugger and get a ps output? (You will need to have the DDB
  option specified in your config file).
 
 Sorry for butting in.  I think Archie is referring to bg fsck gaining an
 unfair share of cpu due to it running due to IO completions.  Last I
 heard, we were waiting until after 5.0 to experiment with scheduler
 changes to make it more fair.  I have not seen any hard locks or other
 problems with bg fsck after your commit.

I'm actually seeing something different. The box becomes unresponsive
(except for virtual console changes and CTRL-ALT-ESC) but there's no
disk activity. It never recovers.

Reproduced it again just now. After pulling the plug and rebooting
I didn't touch the box.  It booted normally, started background
fsck, and the HDD light was blinking as expected. After about 10
seconds, rather suddenly the HDD light stopped blinking.  At this
point it was pretty dead.  Broke into the debugger and it showed a
similar 'ps' output to what I previously posted.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Dan Nelson
In the last episode (Dec 06), David Rhodus said:
 Why does softupdates not get enabled on / , by default on the
 install?

Softupdates updates on-disk structures in the background, and
background fsck cannot relink unreferenced files into lost+found, so
you run the risk of losing both the original and backup copies of
important files in case of a sudden reboot.  Imagine you edited
/etc/rc.conf, saved it, and 5 seconds later the system panic'ed.
Because the default metadata flush time is 28 seconds, there's a pretty
good chance that neither the new file or the original is in /etc after
a reboot.  I got bit by this three times before I learned my lesson.  I
have disable softupdates on /, and crank the softupdates delays down to
10/11/12 seconds to minimize the risk to my other filesystems. At least
there are /var/backups and /boot/kernel.old which let you recover the
really important files :)

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Dan Nelson wrote:
  Why does softupdates not get enabled on / , by default on the
  install?
 
 Softupdates updates on-disk structures in the background, and
 background fsck cannot relink unreferenced files into lost+found, so
 you run the risk of losing both the original and backup copies of
 important files in case of a sudden reboot.  Imagine you edited
 /etc/rc.conf, saved it, and 5 seconds later the system panic'ed.
 Because the default metadata flush time is 28 seconds, there's a pretty
 good chance that neither the new file or the original is in /etc after
 a reboot.  I got bit by this three times before I learned my lesson.  I

I don't understand this.. presumably vi updates the file contents by
opening and writing into the file; why would this cause the file's
directory entry to disappear?

On the other hand, if you do mv rc.conf.new rc.conf then you are
supposedly guaranteed that the file exists in some form; see rename(2).

In any case, you seem to be implying that with respect to modifying
files just before a system crash:

(a) Softupdates is more 'dangerous' than non-softupdates
(b) Background fsck is more 'dangerous' than normal fsck

Is this really true? I thought if anything the reverse of (a) would be true.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Julian Elischer


On Fri, 6 Dec 2002, Archie Cobbs wrote:

 Reproduced it again just now. After pulling the plug and rebooting
 I didn't touch the box.  It booted normally, started background
 fsck, and the HDD light was blinking as expected. After about 10
 seconds, rather suddenly the HDD light stopped blinking.  At this
 point it was pretty dead.  Broke into the debugger and it showed a
 similar 'ps' output to what I previously posted.

you need a serial console ...
:-)

 
 -Archie
 
 __
 Archie Cobbs * Packet Design * http://www.packetdesign.com
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Julian Elischer


On Fri, 6 Dec 2002, Archie Cobbs wrote:

 David Rhodus wrote:
   Softupdates is enabled on /usr and /var but not /.
  
  Why does softupdates not get enabled on / , by default on the install?
 
 I disabled softupdates on / back when having it enabled caused disk
 full problems during 'make installworld,' and never re-enabled it.
 
 FYI at this point my 50MB / partition is woefully inadequate. I can't
 even 'make install kernel' without first removing all existing modules,
 and even so / ends up 106% full.
 

here's a hint..
most systems follow / with their swap region..

you can boot from fixit, or picoBSD floppy 
and use disklabel -e to exend the root partition
then you can use growfs to add the new space to your root fs.

Usually the 50MB that would make a bif difference to / won;t be really
missed from teh swap, and you can always add more swap spave using a
swapfile etc if it gets short.


 Finally, one more bit of info: I have WITNESS enabled in this kernel
 and get this message during boot:
 
 /usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from 
/usr/src/sys/pci/if_dc.c:691
 
 -Archie
 
 __
 Archie Cobbs * Packet Design * http://www.packetdesign.com
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Nate Lawson
On Fri, 6 Dec 2002, Archie Cobbs wrote:
 To copy the 'ps' debugger output, I'd have to manually copy it all,
 so here are just a few highlights:
 
 Proc  State
   -
 fsck_ufs  0004000 norm[SLPQ nbufbs c036e5b0][SLP]
 fsck  0004002 norm[SLPQ   wait c124dce8][SLP]
 syncer204 norm[SLPQ nbufbs c036e5b0][SLP]
 vnlru 204 norm[SLPQ vlruwt c12c0ce8][SLP]
 bufdaemon 204 norm[SLPQ qsleep c036e5a4][SLP]
 swapper   200 norm[SLPQ  sched c0315a20][SLP]

Output from tr for the pids for syncer and fsck_ufs?  show locks, show
lockedvnods?

-Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Nate Lawson [EMAIL PROTECTED]
Date: Fri, 6 Dec 2002 10:57:13 -0800 (PST)
CC: Kirk McKusick [EMAIL PROTECTED],
   Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Nate Lawson wrote:
  Does the background fsck process continue to run, or does the whole
  system come to a halt? If the fsck process continues to run, what 
  happens when it eventually finishes? Is the system still dead, or 
  does it come back to life? If the system does not come back to life
  can you get me the output of `ps axl'? If not, can you break into
  the debugger and get a ps output? (You will need to have the DDB
  option specified in your config file).
 
 Sorry for butting in.  I think Archie is referring to bg fsck gaining
 an unfair share of cpu due to it running due to IO completions. Last I
 heard, we were waiting until after 5.0 to experiment with scheduler
 changes to make it more fair.  I have not seen any hard locks or other
 problems with bg fsck after your commit.

I'm actually seeing something different. The box becomes unresponsive
(except for virtual console changes and CTRL-ALT-ESC) but there's no
disk activity. It never recovers.

Reproduced it again just now. After pulling the plug and rebooting
I didn't touch the box.  It booted normally, started background
fsck, and the HDD light was blinking as expected. After about 10
seconds, rather suddenly the HDD light stopped blinking.  At this
point it was pretty dead.  Broke into the debugger and it showed a
similar 'ps' output to what I previously posted.

-Archie

Your ps shows fsck_ufs and the syncer process both blocked on nbufbs.
That means the system has blocked them from running bacause it feels
that there are too many dirty buffers. What you are probably experiencing
is that you have a relatively small memory machine which has a rather
low threshhold for blocking on dirty buffers. All the dirty buffers
in your system are held by the indirect blocks of the snapshot and
thus the bufdaemon cannot push them out. That task can only be done
by the syncer who is also blocked. Could you please run the following
command on your system and send me the results:

sysctl vfs.lodirtybuffers
sysctl vfs.hidirtybuffers
sysctl vfs.numdirtybuffers

both before and after the lockup. If you cannot run this command after
the lockup, the global variable names are:

lodirtybuffers
hidirtybuffers
numdirtybuffers

If my hypothesis is correct, that will let me tweek the thrshholds on
dirty buffers to get a solution.

Kirk McKusick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Sam Leffler
 Finally, one more bit of info: I have WITNESS enabled in this kernel and
get this message during boot:

 /usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from
/usr/src/sys/pci/if_dc.c:691


if_attach does a malloc with M_WAITOK.  If the attach happens inside a lock
in the driver's attach method (typical) then you'll get this complaint.
Fixing it, and some other similar stuff, requires some care since the code
assumes malloc will not fail.

I decided to leave it until after 5.0.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
The loss of files under soft updates is possible if your editor
fails to fsync the new file before unlinking the old file. The
`vi' editor always does an `fsync' after writing the new copy and
before removing the old copy. I have not checked with other editors
such as emacs to see if they properly use fsync. Note that there
is also a vulnerability without soft updates, it is just that the
window of vulnerability is shorter. So, editors should always do
fsync's, it is just more critical if you are using soft updates (or
journalling for that matter).

The main reason for not using soft updates on the root filesystem
was because of the delay between removing files and having the
space show up. The result was that world installs on the root
filesystem often failed if the root was nearly full (as is so
often the case). That problem has now been fixed in 5.0 with a
callback to soft updates if a filesystem full error is about to
be generated. When called back, soft updates expedites the freeing
of space so that the new allocation can succeed. So, the primary
reason for not using soft updates on the root is now fixed. If
however, mainline editors are not doing fsync's, then there is
still a good reason not to use soft updates on the root filesystem.

Kirk McKusick

=-=-=-=-=

From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Dan Nelson [EMAIL PROTECTED]
Date: Fri, 6 Dec 2002 11:28:52 -0800 (PST)
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Dan Nelson wrote:
  Why does softupdates not get enabled on / , by default on the
  install?
 
 Softupdates updates on-disk structures in the background, and
 background fsck cannot relink unreferenced files into lost+found, so
 you run the risk of losing both the original and backup copies of
 important files in case of a sudden reboot.  Imagine you edited
 /etc/rc.conf, saved it, and 5 seconds later the system panic'ed.
 Because the default metadata flush time is 28 seconds, there's a pretty
 good chance that neither the new file or the original is in /etc after
 a reboot.  I got bit by this three times before I learned my lesson.  I

I don't understand this.. presumably vi updates the file contents by
opening and writing into the file; why would this cause the file's
directory entry to disappear?

On the other hand, if you do mv rc.conf.new rc.conf then you are
supposedly guaranteed that the file exists in some form; see rename(2).

In any case, you seem to be implying that with respect to modifying
files just before a system crash:

(a) Softupdates is more 'dangerous' than non-softupdates
(b) Background fsck is more 'dangerous' than normal fsck

Is this really true? I thought if anything the reverse of (a) would be true.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Kirk McKusick wrote:
 by the syncer who is also blocked. Could you please run the following
 command on your system and send me the results:
 
   sysctl vfs.lodirtybuffers
   sysctl vfs.hidirtybuffers
   sysctl vfs.numdirtybuffers
 
 both before and after the lockup. If you cannot run this command after
 the lockup, the global variable names are:
 
   lodirtybuffers
   hidirtybuffers
   numdirtybuffers

Before (system running normally):

vfs.lodirtybuffers: 126
vfs.hidirtybuffers: 252
vfs.numdirtybuffers: 0

After:

vfs.lodirtybuffers: 126
vfs.hidirtybuffers: 252
vfs.numdirtybuffers: 445

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Dan Nelson
In the last episode (Dec 06), Kirk McKusick said:
 The main reason for not using soft updates on the root filesystem was
 because of the delay between removing files and having the space show
 up. The result was that world installs on the root filesystem often
 failed if the root was nearly full (as is so often the case). That
 problem has now been fixed in 5.0 with a callback to soft updates if
 a filesystem full error is about to be generated. When called back,
 soft updates expedites the freeing of space so that the new
 allocation can succeed. So, the primary reason for not using soft
 updates on the root is now fixed. If however, mainline editors are
 not doing fsync's, then there is still a good reason not to use soft
 updates on the root filesystem.

/usr/bin/install does not fsync.  One of my three foot-shootings
involved installing a new /sbin/init and hitting the power switch.

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Kirk McKusick [EMAIL PROTECTED]
Date: Fri, 6 Dec 2002 13:01:20 -0800 (PST)
CC: Archie Cobbs [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED],
   [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Kirk McKusick wrote:
 by the syncer who is also blocked. Could you please run the following
 command on your system and send me the results:
 
   sysctl vfs.lodirtybuffers
   sysctl vfs.hidirtybuffers
   sysctl vfs.numdirtybuffers
 
 both before and after the lockup. If you cannot run this command after
 the lockup, the global variable names are:
 
   lodirtybuffers
   hidirtybuffers
   numdirtybuffers

Before (system running normally):

vfs.lodirtybuffers: 126
vfs.hidirtybuffers: 252
vfs.numdirtybuffers: 0

After:

vfs.lodirtybuffers: 126
vfs.hidirtybuffers: 252
vfs.numdirtybuffers: 445

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

OK, it looks like my hypothesis on having a small number of buffers 
and running out of them is the problem. I enclose below a patch which
should check for the problem arising and help to mitigate it. I
would appreciate you dropping it into your kernel and seeing if
it solves your problem. The fix is not ideal, but merely to see
if it solves this problem. If it does, I will figure out how to
do it properly. Thanks for your help.

Kirk McKusick

Index: sys/buf.h
===
RCS file: /usr/ncvs/src/sys/sys/buf.h,v
retrieving revision 1.138
diff -c -r1.138 buf.h
*** sys/buf.h   2002/08/30 04:04:37 1.138
--- sys/buf.h   2002/12/06 21:44:25
***
*** 468,473 
--- 468,474 
  caddr_t   kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est);
  void  bufinit(void);
  void  bwillwrite(void);
+ int   checkdirtybufs(struct vnode *);
  int   buf_dirty_count_severe(void);
  void  bremfree(struct buf *);
  int   bread(struct vnode *, daddr_t, int, struct ucred *, struct buf **);
Index: kern/vfs_bio.c
===
RCS file: /usr/ncvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.342
diff -c -r1.342 vfs_bio.c
*** kern/vfs_bio.c  2002/11/23 19:10:30 1.342
--- kern/vfs_bio.c  2002/12/06 21:44:35
***
*** 1114,1119 
--- 1114,1137 
  }
  
  /*
+  * Check to see if a vnode holds too many dirty buffers. If it does,
+  * flush it.
+  */
+ int
+ checkdirtybufs(struct vnode *vp)
+ {
+   struct buf *bp;
+   int dirtycnt = 0, error = 0;
+   struct thread *td = curthread;
+ 
+   TAILQ_FOREACH(bp, vp-v_dirtyblkhd, b_vnbufs)
+   dirtycnt++;
+   if (dirtycnt  lodirtybuffers)
+   error = VOP_FSYNC(vp, td-td_ucred, MNT_NOWAIT, td);
+   return (error);
+ }
+ 
+ /*
   * Return true if we have too many dirty buffers.
   */
  int
Index: ufs/ffs/ffs_balloc.c
===
RCS file: /usr/ncvs/src/sys/ufs/ffs/ffs_balloc.c,v
retrieving revision 1.39
diff -c -r1.39 ffs_balloc.c
*** ufs/ffs/ffs_balloc.c2002/10/22 01:14:25 1.39
--- ufs/ffs/ffs_balloc.c2002/12/06 21:49:56
***
*** 295,300 
--- 295,301 
if (bp-b_bufsize == fs-fs_bsize)
bp-b_flags |= B_CLUSTEROK;
bdwrite(bp);
+   checkdirtybufs(vp);
}
}
/*
***
*** 335,340 
--- 336,342 
if (bp-b_bufsize == fs-fs_bsize)
bp-b_flags |= B_CLUSTEROK;
bdwrite(bp);
+   checkdirtybufs(vp);
}
*bpp = nbp;
return (0);
***
*** 756,761 
--- 758,764 
if (bp-b_bufsize == fs-fs_bsize)
bp-b_flags |= B_CLUSTEROK;
bdwrite(bp);
+   checkdirtybufs(vp);
}
}
/*
***
*** 796,801 
--- 799,805 
if (bp-b_bufsize == fs-fs_bsize)
bp-b_flags |= B_CLUSTEROK;
bdwrite(bp);
+   checkdirtybufs(vp);
}
*bpp = nbp;
return (0);

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Julian Elischer wrote:
 most systems follow / with their swap region..
 
 you can boot from fixit, or picoBSD floppy 
 and use disklabel -e to exend the root partition
 then you can use growfs to add the new space to your root fs.

Hmm.. I tried that and it didn't seem to work.

The disklabel change was successful, but growfs didn't seem to
expand the root partition any.. df(1) still shows it as 50M.

I ran growfs after booting single user mode but before mounting
any disks.. perhaps that caused it to not work.

Since that didn't work, I booted a 4.7-REL fixit floppy and tried
to run growfs from there, but then that growfs core dumped:

Program terminated with signal 11, Segmentation fault.
#0  0x804c089 in updclst (block=-874) at growfs.c:2335
2335setbit(cg_clustersfree(acg), block);
(gdb) list
2330return;
2331}
2332/*
2333 * update cluster allocation map
2334 */
2335setbit(cg_clustersfree(acg), block);
2336
(gdb) where
#0  0x804c089 in updclst (block=-874) at growfs.c:2335
#1  0x8049584 in updjcg (cylno=2, utime=1039185218, fsi=4, fso=3, Nflag=0)
at growfs.c:862
#2  0x8048280 in growfs (fsi=4, fso=3, Nflag=0) at growfs.c:219
#3  0x804beb2 in main (argc=2, argv=0xbfbff7a4) at growfs.c:2213
#4  0x8048135 in _start ()

Notice block=-874 which indicates something is weird or corrupted.

So now I've got extra space in the partition which (apparently) is
not being used and I can't seem to get at it (see below).

Plus I have a sneaking suspicion that I've screwed up something,
but there's nothing in the growfs man page that indicates what I
did was wrong.

FYI, this is a test machine so it's OK if it gets hosed.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

$ disklabel ad0s1
# /dev/ad0s1c:
type: ESDI
disk: ad0s1
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 1860
sectors/unit: 29896902
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0 

8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:   20480004.2BSD 1024  8192 32768   # (Cyl.0 - 12*)
  b:   164608   204800  swap# (Cyl.   12*- 22*)
  c: 298969020unused0 0 # (Cyl.0 - 1860*)
  e:40960   3694084.2BSD 1024  819216   # (Cyl.   22*- 25*)
  f: 29486534   4103684.2BSD 1024  819216   # (Cyl.   25*- 1860*)
$ df
Filesystem 1K-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1a49583   36751 886681%/
devfs  1   10   100%/dev
/dev/ad0s1f 14289643 2794938 1035153421%/usr
/dev/ad0s1e1981535551467520%/var
procfs 4   40   100%/proc


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Julian Elischer
I put a copy of / in /usr
then from the fixit, I mounted /usr as / and ran growfs from there..
the trick is to not do it while / is mounted.


On Fri, 6 Dec 2002, Archie Cobbs wrote:

 Julian Elischer wrote:
  most systems follow / with their swap region..
  
  you can boot from fixit, or picoBSD floppy 
  and use disklabel -e to exend the root partition
  then you can use growfs to add the new space to your root fs.
 
 Hmm.. I tried that and it didn't seem to work.
 
 The disklabel change was successful, but growfs didn't seem to
 expand the root partition any.. df(1) still shows it as 50M.
 
 I ran growfs after booting single user mode but before mounting
 any disks.. perhaps that caused it to not work.
 
 Since that didn't work, I booted a 4.7-REL fixit floppy and tried
 to run growfs from there, but then that growfs core dumped:



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Julian Elischer wrote:
 I put a copy of / in /usr
 then from the fixit, I mounted /usr as / and ran growfs from there..
 the trick is to not do it while / is mounted.

/ wasn't mounted yet when I ran growfs:

  I ran growfs after booting single user mode but before mounting
  any disks.. perhaps that caused it to not work.

But it was the root partition and I was running in single user mode.
If that's a problem then the growfs man page should say so, or maybe
it should be more clear about what is meant by mounted.

In any case, running it from the fixit floppy didn't work either
(got a core dump), but that may be because it was already screwed up.

So at minimum, there's a documentation bug (IMHO).

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Kirk McKusick wrote:
 OK, it looks like my hypothesis on having a small number of buffers 
 and running out of them is the problem. I enclose below a patch which
 should check for the problem arising and help to mitigate it. I
 would appreciate you dropping it into your kernel and seeing if
 it solves your problem. The fix is not ideal, but merely to see
 if it solves this problem. If it does, I will figure out how to
 do it properly. Thanks for your help.

Yep, that fixes it. Now I just get the usual sluggishness while the
background fsck runs (which is not too bad), but it eventually
finishes and then all is well.

Thanks,
-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Giorgos Keramidas
On Fri, 6 Dec 2002, Archie Cobbs wrote:
 David Rhodus wrote:
   Softupdates is enabled on /usr and /var but not /.
 
  Why does softupdates not get enabled on / , by default on the install?

 I disabled softupdates on / back when having it enabled caused disk
 full problems during 'make installworld,' and never re-enabled it.

 FYI at this point my 50MB / partition is woefully inadequate. I can't
 even 'make install kernel' without first removing all existing modules,
 and even so / ends up 106% full.

Not very surprising.  With just a couple of kernels around, my current
usage on / is way over 50 MB.  And I keep my /tmp files on an md(4) fs.

gothmog# du -kx / | grep -v '/.*/' | grep '[0-9][0-9]\+'
2700/stand
1628/etc
6814/bin
28004   /boot
2946/root
21118   /sbin
63244   /

The largest amount of space is under /boot where exactly 2 kernels are
kept now (kernel and kernel.old, just in case an installkernel goes
very wrong) but /sbin isn't very small either.

Giorgos

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
I suggest that we drag Thomas-Henning von Kamptz into this
discussion as he was one of the main authors of growfs. He
is copied on my reply.

Kirk McKusick

=-=-=-=-=-=

From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Julian Elischer [EMAIL PROTECTED]
Date: Fri, 6 Dec 2002 14:52:24 -0800 (PST)
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Julian Elischer wrote:
 most systems follow / with their swap region..
 
 you can boot from fixit, or picoBSD floppy 
 and use disklabel -e to exend the root partition
 then you can use growfs to add the new space to your root fs.

Hmm.. I tried that and it didn't seem to work.

The disklabel change was successful, but growfs didn't seem to
expand the root partition any.. df(1) still shows it as 50M.

I ran growfs after booting single user mode but before mounting
any disks.. perhaps that caused it to not work.

Since that didn't work, I booted a 4.7-REL fixit floppy and tried
to run growfs from there, but then that growfs core dumped:

Program terminated with signal 11, Segmentation fault.
#0  0x804c089 in updclst (block=-874) at growfs.c:2335
2335setbit(cg_clustersfree(acg), block);
(gdb) list
2330return;
2331}
2332/*
2333 * update cluster allocation map
2334 */
2335setbit(cg_clustersfree(acg), block);
2336
(gdb) where
#0  0x804c089 in updclst (block=-874) at growfs.c:2335
#1  0x8049584 in updjcg (cylno=2, utime=1039185218, fsi=4, fso=3, Nflag=0)
at growfs.c:862
#2  0x8048280 in growfs (fsi=4, fso=3, Nflag=0) at growfs.c:219
#3  0x804beb2 in main (argc=2, argv=0xbfbff7a4) at growfs.c:2213
#4  0x8048135 in _start ()

Notice block=-874 which indicates something is weird or corrupted.

So now I've got extra space in the partition which (apparently) is
not being used and I can't seem to get at it (see below).

Plus I have a sneaking suspicion that I've screwed up something,
but there's nothing in the growfs man page that indicates what I
did was wrong.

FYI, this is a test machine so it's OK if it gets hosed.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

$ disklabel ad0s1
# /dev/ad0s1c:
type: ESDI
disk: ad0s1
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 1860
sectors/unit: 29896902
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0 

8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:   20480004.2BSD 1024  8192 32768   # (Cyl.0 - 12*)
  b:   164608   204800  swap# (Cyl.   12*- 22*)
  c: 298969020unused0 0 # (Cyl.0 - 1860*)
  e:40960   3694084.2BSD 1024  819216   # (Cyl.   22*- 25*)
  f: 29486534   4103684.2BSD 1024  819216   # (Cyl.   25*- 1860*)
$ df
Filesystem 1K-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1a49583   36751 886681%/
devfs  1   10   100%/dev
/dev/ad0s1f 14289643 2794938 1035153421%/usr
/dev/ad0s1e1981535551467520%/var
procfs 4   40   100%/proc


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Kirk McKusick [EMAIL PROTECTED]
Date: Fri, 6 Dec 2002 15:23:36 -0800 (PST)
CC: Archie Cobbs [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED],
   [EMAIL PROTECTED]
X-ASK-Info: Whitelist match

Kirk McKusick wrote:
 OK, it looks like my hypothesis on having a small number of buffers 
 and running out of them is the problem. I enclose below a patch which
 should check for the problem arising and help to mitigate it. I
 would appreciate you dropping it into your kernel and seeing if
 it solves your problem. The fix is not ideal, but merely to see
 if it solves this problem. If it does, I will figure out how to
 do it properly. Thanks for your help.

Yep, that fixes it. Now I just get the usual sluggishness while the
background fsck runs (which is not too bad), but it eventually
finishes and then all is well.

Thanks,
-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

Thanks for verifying that the idea works. I will attempt to figure
out how to do it correctly and submit a proposed fix.

Kirk McKusick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Kirk McKusick
Adding a two minute delay before starting background fsck
sounds like a very good idea to me. Please send me your
suggested change.

Kirk McKusick

=-=-=-=-=

Date: Fri, 6 Dec 2002 10:44:45 -0800
From: Brooks Davis [EMAIL PROTECTED]
To: Nate Lawson [EMAIL PROTECTED]
Cc: Kirk McKusick [EMAIL PROTECTED],
   Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
X-ASK-Info: Confirmed by User

On Fri, Dec 06, 2002 at 10:27:10AM -0800, Nate Lawson wrote:
 On Thu, 5 Dec 2002, Kirk McKusick wrote:
  Does the background fsck process continue to run, or does the whole
  system come to a halt? If the fsck process continues to run, what=20
  happens when it eventually finishes? Is the system still dead, or=20
  does it come back to life? If the system does not come back to life
  can you get me the output of `ps axl'? If not, can you break into
  the debugger and get a ps output? (You will need to have the DDB
  option specified in your config file).
=20
 Sorry for butting in.  I think Archie is referring to bg fsck gaining an
 unfair share of cpu due to it running due to IO completions.  Last I
 heard, we were waiting until after 5.0 to experiment with scheduler
 changes to make it more fair.  I have not seen any hard locks or other
 problems with bg fsck after your commit.

My experience is that, at least with my laptop (which has a very slow
disk), bg fsck works OK, but starting applictions for the first time
while fsck is running is _very_ painful.  Even getty seems to have a
hard time.  I've found that adding a two minute delay before the fsck is
sufficent to allow the system to finish starting up and for me to load X
and my main applictions which lets me work while bg fsck is running.  I
posted a patch to add an optional delay in the rc scripts a while ago,
but Kirk was going to re-enable the priority stuff soon so I didn't
persue it.  If there's intrest, I'll regenerate it and repost it.

-- Brooks

Any statement of the form X is the one, true Y is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-06 Thread Archie Cobbs
Kirk McKusick wrote:
 I suggest that we drag Thomas-Henning von Kamptz into this
 discussion as he was one of the main authors of growfs. He
 is copied on my reply.

Thanks.

FYI, I finally fixed things by doing what Julian suggested, which
is to copy / to /usr, reboot with /usr mounted as /, newfs /, and
then copy everything back.

So in summary my recommendation is to add a big warning to the
growfs(1) man page that is should not be run on the root partition,
even if you have booted single-user mode and haven't mounted / yet.
I.e., to grow a root partition, you must boot from a different partition.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-05 Thread Kirk McKusick
Date: Thu, 5 Dec 2002 15:22:27 -0800 (PST)
From: Archie Cobbs [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: backgroud fsck is still locking up system

Just rebuilt -current this morning. Background fsck is still
causing a soft lockup. I thought the conclusion was we were
going to disable it for 5.0.

Not trying to rush anyone, just pointing out that this
still needs to be done..

-Archie

__
Archie Cobbs*Packet Design*http://www.packetdesign.com

What do you mean by background fsck causing a soft lockup?
Is it failing? Is it deadlocking the system? Do you have a
specific test case that shows the problem? Needless to say
it is working fine on my system and on my regression tests.
The only problem that I am having with 5.0 as of last night
is getting login to work on my console.

Kirk McKusick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-05 Thread Archie Cobbs
Kirk McKusick wrote:
   Just rebuilt -current this morning. Background fsck is still
   causing a soft lockup. I thought the conclusion was we were
   going to disable it for 5.0.
 
 What do you mean by background fsck causing a soft lockup?
 Is it failing? Is it deadlocking the system? Do you have a
 specific test case that shows the problem? Needless to say
 it is working fine on my system and on my regression tests.
 The only problem that I am having with 5.0 as of last night
 is getting login to work on my console.

What happens is that at first I can login, but the system seems
slow. I then got as far as running 'top' but it never refreshed its
display and subsequently all keystrokes were ignored. Changing
virtual terminals still works OK, but they are effectively dead too.
I'm imagining processes getting stuck on some lock one by one.

Top did get as far as showing the background fsck process, which
had a priority of -6 or something.

The previous time it didn't even spit out a login prompt, but
that may just be due to experimental noise.

For me, it appears easy to reproduce...

1. Boot -current system
2. Pull the power cable out
3. Put the power cable back in
4. Let the box boot; it notes backgroud fsck
5. Login and try to do something

I can give you more details about my system separately if you like.

Thanks,
-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-05 Thread Kirk McKusick
Does the background fsck process continue to run, or does the whole
system come to a halt? If the fsck process continues to run, what 
happens when it eventually finishes? Is the system still dead, or 
does it come back to life? If the system does not come back to life
can you get me the output of `ps axl'? If not, can you break into
the debugger and get a ps output? (You will need to have the DDB
option specified in your config file).

Kirk McKusick

=-=-=-=-=-=

From: Archie Cobbs [EMAIL PROTECTED]
Subject: Re: backgroud fsck is still locking up system (fwd)
In-Reply-To: [EMAIL PROTECTED]
To: Kirk McKusick [EMAIL PROTECTED]
Date: Thu, 5 Dec 2002 16:22:20 -0800 (PST)
CC: Archie Cobbs [EMAIL PROTECTED], Robert Watson [EMAIL PROTECTED],
   [EMAIL PROTECTED]
X-ASK-Info: Confirmed by User

Kirk McKusick wrote:
   Just rebuilt -current this morning. Background fsck is still
   causing a soft lockup. I thought the conclusion was we were
   going to disable it for 5.0.
 
 What do you mean by background fsck causing a soft lockup?
 Is it failing? Is it deadlocking the system? Do you have a
 specific test case that shows the problem? Needless to say
 it is working fine on my system and on my regression tests.
 The only problem that I am having with 5.0 as of last night
 is getting login to work on my console.

What happens is that at first I can login, but the system seems
slow. I then got as far as running 'top' but it never refreshed its
display and subsequently all keystrokes were ignored. Changing
virtual terminals still works OK, but they are effectively dead too.
I'm imagining processes getting stuck on some lock one by one.

Top did get as far as showing the background fsck process, which
had a priority of -6 or something.

The previous time it didn't even spit out a login prompt, but
that may just be due to experimental noise.

For me, it appears easy to reproduce...

1. Boot -current system
2. Pull the power cable out
3. Put the power cable back in
4. Let the box boot; it notes backgroud fsck
5. Login and try to do something

I can give you more details about my system separately if you like.

Thanks,
-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: backgroud fsck is still locking up system (fwd)

2002-12-05 Thread Archie Cobbs
Kirk McKusick wrote:
 Does the background fsck process continue to run, or does the whole
 system come to a halt? If the fsck process continues to run, what 
 happens when it eventually finishes? Is the system still dead, or 
 does it come back to life? If the system does not come back to life
 can you get me the output of `ps axl'? If not, can you break into
 the debugger and get a ps output? (You will need to have the DDB
 option specified in your config file).

I didn't notice whether it was running or not... of course the only
way to tell would be to look at the HDD light. I didn't wait more
than several minutes so not sure if it would ever finish.

I'll try the other stuff tomorrow as I'm away from the office now.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message