Re: backgroud fsck is still locking up system (fwd)
Date: Mon, 9 Dec 2002 11:19:13 -0800 From: Brooks Davis [EMAIL PROTECTED] To: Kirk McKusick [EMAIL PROTECTED] Cc: Brooks Davis [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED], Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) On Fri, Dec 06, 2002 at 05:52:38PM -0800, Kirk McKusick wrote: Adding a two minute delay before starting background fsck sounds like a very good idea to me. Please send me your suggested change. Here it is. As written it doesn't add the delay, but you can change etc/defaults/rc.conf to do that it desired. -- Brooks I have added your suggested change to -current (6.0). I decided to set the default startup delay to sixty seconds as that seems to be enough time to let the initial system startup settle down. If this change proves to be popular, it can be considered for MFC'ing to 5.0. Kirk McKusick =-=-=-=-=-= From: Kirk McKusick [EMAIL PROTECTED] Date: Tue, 17 Dec 2002 23:21:31 -0800 (PST) To: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: cvs commit: src/etc rc src/etc/defaults rc.conf src/etc/rc.d bgfsck src/share/man/man5 rc.conf.5 X-FreeBSD-CVS-Branch: HEAD mckusick2002/12/17 23:21:31 PST Modified files: etc rc etc/defaults rc.conf etc/rc.d bgfsck share/man/man5 rc.conf.5 Log: Delay an optional amount of time after booting before starting a background fsck. The delay defaults to sixty seconds to allow large applications such as the X server to start before disk I/O bandwidth is monopolized by fsck. Submitted by: Brooks Davis [EMAIL PROTECTED] Sponsored by: DARPA NAI Labs. Revision ChangesPath 1.165 +1 -0 src/etc/defaults/rc.conf 1.324 +8 -2 src/etc/rc 1.3 +13 -2 src/etc/rc.d/bgfsck 1.168 +5 -0 src/share/man/man5/rc.conf.5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, Dec 06, 2002 at 05:52:38PM -0800, Kirk McKusick wrote: Adding a two minute delay before starting background fsck sounds like a very good idea to me. Please send me your suggested change. Here it is. As written it doesn't add the delay, but you can change etc/defaults/rc.conf to do that it desired. -- Brooks -- Any statement of the form X is the one, true Y is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 Index: etc/rc === RCS file: /usr/cvs/src/etc/rc,v retrieving revision 1.323 diff -u -p -r1.323 rc --- etc/rc 26 Nov 2002 17:51:03 - 1.323 +++ etc/rc 4 Dec 2002 23:08:41 - @@ -982,8 +982,14 @@ esac # Start background fsck checks if necessary case ${background_fsck} in [Yy][Ee][Ss]) - echo 'Starting background filesystem checks' - nice -4 fsck -B -p 21 | logger -p daemon.notice + bgfsck_msg='Starting background file system checks' + if [ ${background_fsck_delay:=0} -gt 0 ]; then + bgfsck_msg=${bgfsck_msg} in ${background_fsck_delay} seconds + fi + echo ${bgfsck_msg}. + + (sleep ${background_fsck_delay}; nice -4 fsck -B -p) 21 | \ + logger -p daemon.notice ;; esac Index: etc/defaults/rc.conf === RCS file: /usr/cvs/src/etc/defaults/rc.conf,v retrieving revision 1.164 diff -u -p -r1.164 rc.conf --- etc/defaults/rc.conf6 Dec 2002 05:23:37 - 1.164 +++ etc/defaults/rc.conf6 Dec 2002 18:02:18 - @@ -40,6 +40,7 @@ script_name_sep=# Change if your sta rc_conf_files=/etc/rc.conf /etc/rc.conf.local fsck_y_enable=NO # Set to YES to do fsck -y if the initial preen fails. background_fsck=YES # Attempt to run fsck in the background where possible. +background_fsck_delay=0 # Time to wait (seconds) before starting the fsck. extra_netfs_types=NO # List of network extra filesystem types for delayed # mount at startup (or NO). Index: etc/rc.d/bgfsck === RCS file: /usr/cvs/src/etc/rc.d/bgfsck,v retrieving revision 1.2 diff -u -p -r1.2 bgfsck --- etc/rc.d/bgfsck 28 Jul 2002 03:38:10 - 1.2 +++ etc/rc.d/bgfsck 9 Oct 2002 23:31:45 - @@ -11,9 +11,20 @@ name=background-fsck rcvar=background_fsck -start_precmd=echo 'Starting background file system checks.' -start_cmd=nice -4 fsck -B -p 21 | logger -p daemon.notice +start_cmd=bgfsck_start stop_cmd=: + +bgfsck_start () +{ + bgfsck_msg='Starting background file system checks' + if [ ${background_fsck_delay:=0} -gt 0 ]; then + bgfsck_msg=${bgfsck_msg} in ${background_fsck_delay} seconds + fi + echo ${bgfsck_msg}. + + (sleep ${background_fsck_delay}; nice -4 fsck -B -p) 21 | \ + logger -p daemon.notice +} load_rc_config $name run_rc_command $1 Index: share/man/man5/rc.conf.5 === RCS file: /usr/cvs/src/share/man/man5/rc.conf.5,v retrieving revision 1.166 diff -u -p -r1.166 rc.conf.5 --- share/man/man5/rc.conf.529 Nov 2002 11:39:19 - 1.166 +++ share/man/man5/rc.conf.54 Dec 2002 23:11:53 - @@ -734,6 +734,11 @@ If set to the system will attempt to run .Xr fsck 8 in the background where possible. +.It Va background_fsck_delay +.Pq Vt int +The amount of time in seconds to sleep before starting a background fsck. +Setting this to a non-zero number may allow large applications such as +the X server to start before disk I/O bandwidth is monopolized by fsck. .It Va extra_netfs_types .Pq Vt str If set to something other than msg48419/pgp0.pgp Description: PGP signature
Re: backgroud fsck is still locking up system (fwd)
Julian Elischer wrote: Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. I've seen ufs write back the superblock on unmounting a read-only filesystem (!). it was a few years ago but I wouldn;t be surprised if it was still true.. After you did it on the filesystem. (ran growfs) what did you do next? the safe answer would be to pull the plug. reboot It seems counter-intuitive that a filesystem mounted read only would be modified by the kernel. I'm sure there's some subtlety I'm not aware of though.. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Bruce Evans wrote: Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. Perhaps the unobvious point is that fsck could work. If the mount is r/w, then neither growfs nor fsck can even open the partition r/w. fsck somehow works in the case of a r/o root, but growfs apparently doesn't. I think fsck depends on no other processes making (significant) vfs syscalls for on the same partition while it is running (even r/o ones might be harmful if they caused reads of metadata which might be inconsistent). Then when fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata. growfs doesn't do this, and even if it did it is not clear that it does all the necessary syncing (growfs may change more or different metadata). However, I think it does most of the necessary things. FYI, I submitted a bug/enhancement request to summarize this.. http://www.freebsd.org/cgi/query-pr.cgi?pr=46110 -Archie P.S. Why does submitting a bug now generate an email response from (and who the heck is) ThinkHost Support ?? __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Date: Sat, 7 Dec 2002 11:07:23 -0800 (PST) From: Nate Lawson [EMAIL PROTECTED] To: Archie Cobbs [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) X-ASK-Info: Whitelist match On Fri, 6 Dec 2002, Archie Cobbs wrote: Julian Elischer wrote: I put a copy of / in /usr then from the fixit, I mounted /usr as / and ran growfs from there.. the trick is to not do it while / is mounted. / wasn't mounted yet when I ran growfs: I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. But it was the root partition and I was running in single user mode. If that's a problem then the growfs man page should say so, or maybe it should be more clear about what is meant by mounted. growfs won't work with any mounted fs (even ro) because it needs to quiesce kenrel file ops and you can't do that from usermode (yet). I wonder if there might be some clever way to abuse snapshots to have this same effect (i.e. keep an open handle to the underlying fs cdev for growfs to use and then mount a snapshot of the fs over its own mountpoint for procs to use.) In any case, running it from the fixit floppy didn't work either (got a core dump), but that may be because it was already screwed up. So at minimum, there's a documentation bug (IMHO). I assume the superblock changes between 4 and 5 changed the ability to use 4.x growfs on 5.x ufs partitions. Also, does growfs need to be updated for ufs2? -Nate I have made the structural changes to growfs to make it work for UFS2, however, I have not done more than cursory testing. I would appreciate it if someone could try running it on various UFS2 filesystems to see if it works properly. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
In theory the MNT_RELOAD command should reload all the filesystem metadata properly though this feature has not been tested with growfs. If anyone has the time to try it out and report back any problems, that would be appreciated. Kirk McKusick =-=-=-=-= From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Bruce Evans [EMAIL PROTECTED] Date: Sun, 8 Dec 2002 17:03:43 -0800 (PST) CC: Archie Cobbs [EMAIL PROTECTED], Kirk McKusick [EMAIL PROTECTED], Julian Elischer [EMAIL PROTECTED], [EMAIL PROTECTED], Thomas-Henning von Kamptz [EMAIL PROTECTED] X-ASK-Info: Whitelist match Bruce Evans wrote: Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. Perhaps the unobvious point is that fsck could work. If the mount is r/w, then neither growfs nor fsck can even open the partition r/w. fsck somehow works in the case of a r/o root, but growfs apparently doesn't. I think fsck depends on no other processes making (significant) vfs syscalls for on the same partition while it is running (even r/o ones might be harmful if they caused reads of metadata which might be inconsistent). Then when fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata. growfs doesn't do this, and even if it did it is not clear that it does all the necessary syncing (growfs may change more or different metadata). However, I think it does most of the necessary things. FYI, I submitted a bug/enhancement request to summarize this.. http://www.freebsd.org/cgi/query-pr.cgi?pr=46110 -Archie P.S. Why does submitting a bug now generate an email response from (and who the heck is) ThinkHost Support ?? __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: So in summary my recommendation is to add a big warning to the growfs(1) man page that is should not be run on the root partition, even if you have booted single-user mode and haven't mounted / yet. I.e., to grow a root partition, you must boot from a different partition. Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Sun, 8 Dec 2002, Bruce Evans wrote: On Fri, 6 Dec 2002, Archie Cobbs wrote: So in summary my recommendation is to add a big warning to the growfs(1) man page that is should not be run on the root partition, even if you have booted single-user mode and haven't mounted / yet. I.e., to grow a root partition, you must boot from a different partition. Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Hmm. I guess one of the interesting questions is: what happened to the safety belts? I would have thought that GEOM would prevent opening the partition writable while it was mounted... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
In message [EMAIL PROTECTED], Kirk McKusick wr ites: Adding a two minute delay before starting background fsck sounds like a very good idea to me. Please send me your suggested change. BTW, I've been using a fsck_ffs modificaton for a while now that does something like the disabled kernel I/O slowdown, but from userland. It seems to help quite a lot in leaving some disk bandwidth for other processes. Waiting a while before starting the fsck seems like a good idea anyway though. Patch below (I think I posted an earlier version of this before). Ian Index: fsutil.c === RCS file: /dump/FreeBSD-CVS/src/sbin/fsck_ffs/fsutil.c,v retrieving revision 1.19 diff -u -r1.19 fsutil.c --- fsutil.c27 Nov 2002 02:18:57 - 1.19 +++ fsutil.c4 Dec 2002 02:16:28 - @@ -40,6 +40,7 @@ #endif /* not lint */ #include sys/param.h +#include sys/time.h #include sys/types.h #include sys/sysctl.h #include sys/disklabel.h @@ -62,7 +63,13 @@ #include fsck.h +static void slowio_start(void); +static void slowio_end(void); + long diskreads, totalreads; /* Disk cache statistics */ +struct timeval slowio_starttime; +int slowio_delay_usec = 1; /* Initial IO delay for background fsck */ +int slowio_pollcnt; int ftypeok(union dinode *dp) @@ -350,10 +357,15 @@ offset = blk; offset *= dev_bsize; + if (bkgrdflag) + slowio_start(); if (lseek(fd, offset, 0) 0) rwerror(SEEK BLK, blk); - else if (read(fd, buf, (int)size) == size) + else if (read(fd, buf, (int)size) == size) { + if (bkgrdflag) + slowio_end(); return (0); + } rwerror(READ BLK, blk); if (lseek(fd, offset, 0) 0) rwerror(SEEK BLK, blk); @@ -463,6 +475,39 @@ idesc.id_blkno = blkno; idesc.id_numfrags = frags; (void)pass4check(idesc); +} + +/* Slow down IO so as to leave some disk bandwidth for other processes */ +void +slowio_start() +{ + + /* Delay one in every 8 operations by 16 times the average IO delay */ + slowio_pollcnt = (slowio_pollcnt + 1) 7; + if (slowio_pollcnt == 0) { + usleep(slowio_delay_usec * 16); + gettimeofday(slowio_starttime, NULL); + } +} + +void +slowio_end() +{ + struct timeval tv; + int delay_usec; + + if (slowio_pollcnt != 0) + return; + + /* Update the slowdown interval. */ + gettimeofday(tv, NULL); + delay_usec = (tv.tv_sec - slowio_starttime.tv_sec) * 100 + + (tv.tv_usec - slowio_starttime.tv_usec); + if (delay_usec 64) + delay_usec = 64; + if (delay_usec 100) + delay_usec = 100; + slowio_delay_usec = (slowio_delay_usec * 63 + delay_usec) 6; } /* To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Thanks for reminding me about your userland change to background fsck. I have tried it out and concur that it is the right approach until we manage to get the general solution in the kernel. I suggest that you propose it to release engineering and if approved check it in. Kirk McKusick =-=-=-=-=-= To: Kirk McKusick [EMAIL PROTECTED] cc: Brooks Davis [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED], Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: Your message of Fri, 06 Dec 2002 17:52:38 PST. [EMAIL PROTECTED] Date: Sat, 07 Dec 2002 14:26:39 + From: Ian Dowse [EMAIL PROTECTED] X-ASK-Info: Whitelist match In message [EMAIL PROTECTED], Kirk McKusick wr ites: Adding a two minute delay before starting background fsck sounds like a very good idea to me. Please send me your suggested change. BTW, I've been using a fsck_ffs modificaton for a while now that does something like the disabled kernel I/O slowdown, but from userland. It seems to help quite a lot in leaving some disk bandwidth for other processes. Waiting a while before starting the fsck seems like a good idea anyway though. Patch below (I think I posted an earlier version of this before). Ian Index: fsutil.c === RCS file: /dump/FreeBSD-CVS/src/sbin/fsck_ffs/fsutil.c,v retrieving revision 1.19 diff -u -r1.19 fsutil.c --- fsutil.c27 Nov 2002 02:18:57 - 1.19 +++ fsutil.c4 Dec 2002 02:16:28 - @@ -40,6 +40,7 @@ #endif /* not lint */ #include sys/param.h +#include sys/time.h #include sys/types.h #include sys/sysctl.h #include sys/disklabel.h @@ -62,7 +63,13 @@ #include fsck.h +static void slowio_start(void); +static void slowio_end(void); + long diskreads, totalreads; /* Disk cache statistics */ +struct timeval slowio_starttime; +int slowio_delay_usec = 1; /* Initial IO delay for background fsck */ +int slowio_pollcnt; int ftypeok(union dinode *dp) @@ -350,10 +357,15 @@ offset = blk; offset *= dev_bsize; + if (bkgrdflag) + slowio_start(); if (lseek(fd, offset, 0) 0) rwerror(SEEK BLK, blk); - else if (read(fd, buf, (int)size) == size) + else if (read(fd, buf, (int)size) == size) { + if (bkgrdflag) + slowio_end(); return (0); + } rwerror(READ BLK, blk); if (lseek(fd, offset, 0) 0) rwerror(SEEK BLK, blk); @@ -463,6 +475,39 @@ idesc.id_blkno = blkno; idesc.id_numfrags = frags; (void)pass4check(idesc); +} + +/* Slow down IO so as to leave some disk bandwidth for other processes */ +void +slowio_start() +{ + + /* Delay one in every 8 operations by 16 times the average IO delay */ + slowio_pollcnt = (slowio_pollcnt + 1) 7; + if (slowio_pollcnt == 0) { + usleep(slowio_delay_usec * 16); + gettimeofday(slowio_starttime, NULL); + } +} + +void +slowio_end() +{ + struct timeval tv; + int delay_usec; + + if (slowio_pollcnt != 0) + return; + + /* Update the slowdown interval. */ + gettimeofday(tv, NULL); + delay_usec = (tv.tv_sec - slowio_starttime.tv_sec) * 100 + + (tv.tv_usec - slowio_starttime.tv_usec); + if (delay_usec 64) + delay_usec = 64; + if (delay_usec 100) + delay_usec = 100; + slowio_delay_usec = (slowio_delay_usec * 63 + delay_usec) 6; } /* To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: Julian Elischer wrote: I put a copy of / in /usr then from the fixit, I mounted /usr as / and ran growfs from there.. the trick is to not do it while / is mounted. / wasn't mounted yet when I ran growfs: I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. But it was the root partition and I was running in single user mode. If that's a problem then the growfs man page should say so, or maybe it should be more clear about what is meant by mounted. growfs won't work with any mounted fs (even ro) because it needs to quiesce kenrel file ops and you can't do that from usermode (yet). I wonder if there might be some clever way to abuse snapshots to have this same effect (i.e. keep an open handle to the underlying fs cdev for growfs to use and then mount a snapshot of the fs over its own mountpoint for procs to use.) In any case, running it from the fixit floppy didn't work either (got a core dump), but that may be because it was already screwed up. So at minimum, there's a documentation bug (IMHO). I assume the superblock changes between 4 and 5 changed the ability to use 4.x growfs on 5.x ufs partitions. Also, does growfs need to be updated for ufs2? -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Sat, 7 Dec 2002, Archie Cobbs wrote: Bruce Evans wrote: So in summary my recommendation is to add a big warning to the growfs(1) man page that is should not be run on the root partition, even if you have booted single-user mode and haven't mounted / yet. I.e., to grow a root partition, you must boot from a different partition. Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. I've seen ufs write back the superblock on unmounting a read-only filesystem (!). it was a few years ago but I wouldn;t be surprised if it was still true.. After you did it on the filesystem. (ran growfs) what did you do next? the safe answer would be to pull the plug. In any case, when you're talking about the danger of destroying a filesystem it probably wouldn't hurt to have a little extra clarity in the documentation. Or better yet, should the kernel prevent raw writes to the / partition? Guess that would prevent fsck from working though. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Bruce Evans wrote: So in summary my recommendation is to add a big warning to the growfs(1) man page that is should not be run on the root partition, even if you have booted single-user mode and haven't mounted / yet. I.e., to grow a root partition, you must boot from a different partition. Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. In any case, when you're talking about the danger of destroying a filesystem it probably wouldn't hurt to have a little extra clarity in the documentation. Or better yet, should the kernel prevent raw writes to the / partition? Guess that would prevent fsck from working though. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Sat, 7 Dec 2002, Robert Watson wrote: On Sun, 8 Dec 2002, Bruce Evans wrote: Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning ... Hmm. I guess one of the interesting questions is: what happened to the safety belts? I would have thought that GEOM would prevent opening the partition writable while it was mounted... The kernel doesn't and shouldn't prevent it for the r/o-mounted case (since fsck needs to write to the partition of a mounted file system for at least the case of the root file system mounted r/o), and apparently growfs doesn't prevent it in ths case either. There are lots of safety belts in the kernel for the r/w-mounted case. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Sat, 7 Dec 2002, Archie Cobbs wrote: Bruce Evans wrote: Er, it should be obvious that growfs can't reasonably work on the mounted partitions. growfs.1 doesn't exist, but growfs.8 already has the warning in a general form: Currently growfs can only enlarge unmounted file systems. Do not try enlarging a mounted file system, your system may panic and you will not be able to use the file system any longer... Well, I suspected that it might not work... but I would disagree that it was *obvious* that it would not work. This was before mount had been run, so / was supposedly mounted (?) read-only. Perhaps the unobvious point is that fsck could work. If the mount is r/w, then neither growfs nor fsck can even open the partition r/w. fsck somehow works in the case of a r/o root, but growfs apparently doesn't. I think fsck depends on no other processes making (significant) vfs syscalls for on the same partition while it is running (even r/o ones might be harmful if they caused reads of metadata which might be inconsistent). Then when fsck has finished it calls mount(... MNT_RELOAD...) to sync the metadata. growfs doesn't do this, and even if it did it is not clear that it does all the necessary syncing (growfs may change more or different metadata). However, I think it does most of the necessary things. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Thu, 5 Dec 2002, Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). Sorry for butting in. I think Archie is referring to bg fsck gaining an unfair share of cpu due to it running due to IO completions. Last I heard, we were waiting until after 5.0 to experiment with scheduler changes to make it more fair. I have not seen any hard locks or other problems with bg fsck after your commit. -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, Dec 06, 2002 at 10:27:10AM -0800, Nate Lawson wrote: On Thu, 5 Dec 2002, Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). Sorry for butting in. I think Archie is referring to bg fsck gaining an unfair share of cpu due to it running due to IO completions. Last I heard, we were waiting until after 5.0 to experiment with scheduler changes to make it more fair. I have not seen any hard locks or other problems with bg fsck after your commit. My experience is that, at least with my laptop (which has a very slow disk), bg fsck works OK, but starting applictions for the first time while fsck is running is _very_ painful. Even getty seems to have a hard time. I've found that adding a two minute delay before the fsck is sufficent to allow the system to finish starting up and for me to load X and my main applictions which lets me work while bg fsck is running. I posted a patch to add an optional delay in the rc scripts a while ago, but Kirk was going to re-enable the priority stuff soon so I didn't persue it. If there's intrest, I'll regenerate it and repost it. -- Brooks -- Any statement of the form X is the one, true Y is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 msg48238/pgp0.pgp Description: PGP signature
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). OK, here is some more info.. I easily reproduced the problem again. So far it's 100% reproducible. This time to reproduce it simply booted in single user mode, typed mount -a -t nonfs and then pulled the plug. After the reboot, the HDD light soon stops blinking altogether. I waited for several minutes (which should have been long enough) and it never came back to life, which is not surprising considering there's no disk activity. Breaking into the debugger still works. However, pressing the soft power button no longer causes a graceful shutdown as it normally does. To copy the 'ps' debugger output, I'd have to manually copy it all, so here are just a few highlights: ProcState - fsck_ufs0004000 norm[SLPQ nbufbs c036e5b0][SLP] fsck0004002 norm[SLPQ wait c124dce8][SLP] syncer 204 norm[SLPQ nbufbs c036e5b0][SLP] vnlru 204 norm[SLPQ vlruwt c12c0ce8][SLP] bufdaemon 204 norm[SLPQ qsleep c036e5a4][SLP] swapper 200 norm[SLPQ sched c0315a20][SLP] Softupdates is enabled on /usr and /var but not /. This machine also acts as an NFS client for /home/archie. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Friday, December 6, 2002, at 01:39 PM, Archie Cobbs wrote: Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). OK, here is some more info.. I easily reproduced the problem again. So far it's 100% reproducible. This time to reproduce it simply booted in single user mode, typed mount -a -t nonfs and then pulled the plug. After the reboot, the HDD light soon stops blinking altogether. I waited for several minutes (which should have been long enough) and it never came back to life, which is not surprising considering there's no disk activity. Breaking into the debugger still works. However, pressing the soft power button no longer causes a graceful shutdown as it normally does. To copy the 'ps' debugger output, I'd have to manually copy it all, so here are just a few highlights: Proc State - fsck_ufs 0004000 norm[SLPQ nbufbs c036e5b0][SLP] fsck 0004002 norm[SLPQ wait c124dce8][SLP] syncer 204 norm[SLPQ nbufbs c036e5b0][SLP] vnlru 204 norm[SLPQ vlruwt c12c0ce8][SLP] bufdaemon 204 norm[SLPQ qsleep c036e5a4][SLP] swapper 200 norm[SLPQ sched c0315a20][SLP] Softupdates is enabled on /usr and /var but not /. This machine also acts as an NFS client for /home/archie. Why does softupdates not get enabled on / , by default on the install? -DR To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, Dec 06, 2002 at 01:52:11PM -0500, David Rhodus wrote: On Friday, December 6, 2002, at 01:39 PM, Archie Cobbs wrote: Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). OK, here is some more info.. I easily reproduced the problem again. So far it's 100% reproducible. This time to reproduce it simply booted in single user mode, typed mount -a -t nonfs and then pulled the plug. After the reboot, the HDD light soon stops blinking altogether. I waited for several minutes (which should have been long enough) and it never came back to life, which is not surprising considering there's no disk activity. Breaking into the debugger still works. However, pressing the soft power button no longer causes a graceful shutdown as it normally does. To copy the 'ps' debugger output, I'd have to manually copy it all, so here are just a few highlights: Proc State - fsck_ufs 0004000 norm[SLPQ nbufbs c036e5b0][SLP] fsck 0004002 norm[SLPQ wait c124dce8][SLP] syncer 204 norm[SLPQ nbufbs c036e5b0][SLP] vnlru204 norm[SLPQ vlruwt c12c0ce8][SLP] bufdaemon204 norm[SLPQ qsleep c036e5a4][SLP] swapper 200 norm[SLPQ sched c0315a20][SLP] Softupdates is enabled on /usr and /var but not /. This machine also acts as an NFS client for /home/archie. Why does softupdates not get enabled on / , by default on the install? Read tuning(7). Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg48243/pgp0.pgp Description: PGP signature
Re: backgroud fsck is still locking up system (fwd)
David Rhodus wrote: Softupdates is enabled on /usr and /var but not /. Why does softupdates not get enabled on / , by default on the install? I disabled softupdates on / back when having it enabled caused disk full problems during 'make installworld,' and never re-enabled it. FYI at this point my 50MB / partition is woefully inadequate. I can't even 'make install kernel' without first removing all existing modules, and even so / ends up 106% full. Finally, one more bit of info: I have WITNESS enabled in this kernel and get this message during boot: /usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from /usr/src/sys/pci/if_dc.c:691 -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Nate Lawson wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). Sorry for butting in. I think Archie is referring to bg fsck gaining an unfair share of cpu due to it running due to IO completions. Last I heard, we were waiting until after 5.0 to experiment with scheduler changes to make it more fair. I have not seen any hard locks or other problems with bg fsck after your commit. I'm actually seeing something different. The box becomes unresponsive (except for virtual console changes and CTRL-ALT-ESC) but there's no disk activity. It never recovers. Reproduced it again just now. After pulling the plug and rebooting I didn't touch the box. It booted normally, started background fsck, and the HDD light was blinking as expected. After about 10 seconds, rather suddenly the HDD light stopped blinking. At this point it was pretty dead. Broke into the debugger and it showed a similar 'ps' output to what I previously posted. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
In the last episode (Dec 06), David Rhodus said: Why does softupdates not get enabled on / , by default on the install? Softupdates updates on-disk structures in the background, and background fsck cannot relink unreferenced files into lost+found, so you run the risk of losing both the original and backup copies of important files in case of a sudden reboot. Imagine you edited /etc/rc.conf, saved it, and 5 seconds later the system panic'ed. Because the default metadata flush time is 28 seconds, there's a pretty good chance that neither the new file or the original is in /etc after a reboot. I got bit by this three times before I learned my lesson. I have disable softupdates on /, and crank the softupdates delays down to 10/11/12 seconds to minimize the risk to my other filesystems. At least there are /var/backups and /boot/kernel.old which let you recover the really important files :) -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Dan Nelson wrote: Why does softupdates not get enabled on / , by default on the install? Softupdates updates on-disk structures in the background, and background fsck cannot relink unreferenced files into lost+found, so you run the risk of losing both the original and backup copies of important files in case of a sudden reboot. Imagine you edited /etc/rc.conf, saved it, and 5 seconds later the system panic'ed. Because the default metadata flush time is 28 seconds, there's a pretty good chance that neither the new file or the original is in /etc after a reboot. I got bit by this three times before I learned my lesson. I I don't understand this.. presumably vi updates the file contents by opening and writing into the file; why would this cause the file's directory entry to disappear? On the other hand, if you do mv rc.conf.new rc.conf then you are supposedly guaranteed that the file exists in some form; see rename(2). In any case, you seem to be implying that with respect to modifying files just before a system crash: (a) Softupdates is more 'dangerous' than non-softupdates (b) Background fsck is more 'dangerous' than normal fsck Is this really true? I thought if anything the reverse of (a) would be true. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: Reproduced it again just now. After pulling the plug and rebooting I didn't touch the box. It booted normally, started background fsck, and the HDD light was blinking as expected. After about 10 seconds, rather suddenly the HDD light stopped blinking. At this point it was pretty dead. Broke into the debugger and it showed a similar 'ps' output to what I previously posted. you need a serial console ... :-) -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: David Rhodus wrote: Softupdates is enabled on /usr and /var but not /. Why does softupdates not get enabled on / , by default on the install? I disabled softupdates on / back when having it enabled caused disk full problems during 'make installworld,' and never re-enabled it. FYI at this point my 50MB / partition is woefully inadequate. I can't even 'make install kernel' without first removing all existing modules, and even so / ends up 106% full. here's a hint.. most systems follow / with their swap region.. you can boot from fixit, or picoBSD floppy and use disklabel -e to exend the root partition then you can use growfs to add the new space to your root fs. Usually the 50MB that would make a bif difference to / won;t be really missed from teh swap, and you can always add more swap spave using a swapfile etc if it gets short. Finally, one more bit of info: I have WITNESS enabled in this kernel and get this message during boot: /usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from /usr/src/sys/pci/if_dc.c:691 -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: To copy the 'ps' debugger output, I'd have to manually copy it all, so here are just a few highlights: Proc State - fsck_ufs 0004000 norm[SLPQ nbufbs c036e5b0][SLP] fsck 0004002 norm[SLPQ wait c124dce8][SLP] syncer204 norm[SLPQ nbufbs c036e5b0][SLP] vnlru 204 norm[SLPQ vlruwt c12c0ce8][SLP] bufdaemon 204 norm[SLPQ qsleep c036e5a4][SLP] swapper 200 norm[SLPQ sched c0315a20][SLP] Output from tr for the pids for syncer and fsck_ufs? show locks, show lockedvnods? -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Nate Lawson [EMAIL PROTECTED] Date: Fri, 6 Dec 2002 10:57:13 -0800 (PST) CC: Kirk McKusick [EMAIL PROTECTED], Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Whitelist match Nate Lawson wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). Sorry for butting in. I think Archie is referring to bg fsck gaining an unfair share of cpu due to it running due to IO completions. Last I heard, we were waiting until after 5.0 to experiment with scheduler changes to make it more fair. I have not seen any hard locks or other problems with bg fsck after your commit. I'm actually seeing something different. The box becomes unresponsive (except for virtual console changes and CTRL-ALT-ESC) but there's no disk activity. It never recovers. Reproduced it again just now. After pulling the plug and rebooting I didn't touch the box. It booted normally, started background fsck, and the HDD light was blinking as expected. After about 10 seconds, rather suddenly the HDD light stopped blinking. At this point it was pretty dead. Broke into the debugger and it showed a similar 'ps' output to what I previously posted. -Archie Your ps shows fsck_ufs and the syncer process both blocked on nbufbs. That means the system has blocked them from running bacause it feels that there are too many dirty buffers. What you are probably experiencing is that you have a relatively small memory machine which has a rather low threshhold for blocking on dirty buffers. All the dirty buffers in your system are held by the indirect blocks of the snapshot and thus the bufdaemon cannot push them out. That task can only be done by the syncer who is also blocked. Could you please run the following command on your system and send me the results: sysctl vfs.lodirtybuffers sysctl vfs.hidirtybuffers sysctl vfs.numdirtybuffers both before and after the lockup. If you cannot run this command after the lockup, the global variable names are: lodirtybuffers hidirtybuffers numdirtybuffers If my hypothesis is correct, that will let me tweek the thrshholds on dirty buffers to get a solution. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Finally, one more bit of info: I have WITNESS enabled in this kernel and get this message during boot: /usr/src/sys/vm/uma_core.c:1330: could sleep with dc0 locked from /usr/src/sys/pci/if_dc.c:691 if_attach does a malloc with M_WAITOK. If the attach happens inside a lock in the driver's attach method (typical) then you'll get this complaint. Fixing it, and some other similar stuff, requires some care since the code assumes malloc will not fail. I decided to leave it until after 5.0. Sam To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
The loss of files under soft updates is possible if your editor fails to fsync the new file before unlinking the old file. The `vi' editor always does an `fsync' after writing the new copy and before removing the old copy. I have not checked with other editors such as emacs to see if they properly use fsync. Note that there is also a vulnerability without soft updates, it is just that the window of vulnerability is shorter. So, editors should always do fsync's, it is just more critical if you are using soft updates (or journalling for that matter). The main reason for not using soft updates on the root filesystem was because of the delay between removing files and having the space show up. The result was that world installs on the root filesystem often failed if the root was nearly full (as is so often the case). That problem has now been fixed in 5.0 with a callback to soft updates if a filesystem full error is about to be generated. When called back, soft updates expedites the freeing of space so that the new allocation can succeed. So, the primary reason for not using soft updates on the root is now fixed. If however, mainline editors are not doing fsync's, then there is still a good reason not to use soft updates on the root filesystem. Kirk McKusick =-=-=-=-= From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Dan Nelson [EMAIL PROTECTED] Date: Fri, 6 Dec 2002 11:28:52 -0800 (PST) CC: [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Whitelist match Dan Nelson wrote: Why does softupdates not get enabled on / , by default on the install? Softupdates updates on-disk structures in the background, and background fsck cannot relink unreferenced files into lost+found, so you run the risk of losing both the original and backup copies of important files in case of a sudden reboot. Imagine you edited /etc/rc.conf, saved it, and 5 seconds later the system panic'ed. Because the default metadata flush time is 28 seconds, there's a pretty good chance that neither the new file or the original is in /etc after a reboot. I got bit by this three times before I learned my lesson. I I don't understand this.. presumably vi updates the file contents by opening and writing into the file; why would this cause the file's directory entry to disappear? On the other hand, if you do mv rc.conf.new rc.conf then you are supposedly guaranteed that the file exists in some form; see rename(2). In any case, you seem to be implying that with respect to modifying files just before a system crash: (a) Softupdates is more 'dangerous' than non-softupdates (b) Background fsck is more 'dangerous' than normal fsck Is this really true? I thought if anything the reverse of (a) would be true. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: by the syncer who is also blocked. Could you please run the following command on your system and send me the results: sysctl vfs.lodirtybuffers sysctl vfs.hidirtybuffers sysctl vfs.numdirtybuffers both before and after the lockup. If you cannot run this command after the lockup, the global variable names are: lodirtybuffers hidirtybuffers numdirtybuffers Before (system running normally): vfs.lodirtybuffers: 126 vfs.hidirtybuffers: 252 vfs.numdirtybuffers: 0 After: vfs.lodirtybuffers: 126 vfs.hidirtybuffers: 252 vfs.numdirtybuffers: 445 -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
In the last episode (Dec 06), Kirk McKusick said: The main reason for not using soft updates on the root filesystem was because of the delay between removing files and having the space show up. The result was that world installs on the root filesystem often failed if the root was nearly full (as is so often the case). That problem has now been fixed in 5.0 with a callback to soft updates if a filesystem full error is about to be generated. When called back, soft updates expedites the freeing of space so that the new allocation can succeed. So, the primary reason for not using soft updates on the root is now fixed. If however, mainline editors are not doing fsync's, then there is still a good reason not to use soft updates on the root filesystem. /usr/bin/install does not fsync. One of my three foot-shootings involved installing a new /sbin/init and hitting the power switch. -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Kirk McKusick [EMAIL PROTECTED] Date: Fri, 6 Dec 2002 13:01:20 -0800 (PST) CC: Archie Cobbs [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Whitelist match Kirk McKusick wrote: by the syncer who is also blocked. Could you please run the following command on your system and send me the results: sysctl vfs.lodirtybuffers sysctl vfs.hidirtybuffers sysctl vfs.numdirtybuffers both before and after the lockup. If you cannot run this command after the lockup, the global variable names are: lodirtybuffers hidirtybuffers numdirtybuffers Before (system running normally): vfs.lodirtybuffers: 126 vfs.hidirtybuffers: 252 vfs.numdirtybuffers: 0 After: vfs.lodirtybuffers: 126 vfs.hidirtybuffers: 252 vfs.numdirtybuffers: 445 -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com OK, it looks like my hypothesis on having a small number of buffers and running out of them is the problem. I enclose below a patch which should check for the problem arising and help to mitigate it. I would appreciate you dropping it into your kernel and seeing if it solves your problem. The fix is not ideal, but merely to see if it solves this problem. If it does, I will figure out how to do it properly. Thanks for your help. Kirk McKusick Index: sys/buf.h === RCS file: /usr/ncvs/src/sys/sys/buf.h,v retrieving revision 1.138 diff -c -r1.138 buf.h *** sys/buf.h 2002/08/30 04:04:37 1.138 --- sys/buf.h 2002/12/06 21:44:25 *** *** 468,473 --- 468,474 caddr_t kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est); void bufinit(void); void bwillwrite(void); + int checkdirtybufs(struct vnode *); int buf_dirty_count_severe(void); void bremfree(struct buf *); int bread(struct vnode *, daddr_t, int, struct ucred *, struct buf **); Index: kern/vfs_bio.c === RCS file: /usr/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.342 diff -c -r1.342 vfs_bio.c *** kern/vfs_bio.c 2002/11/23 19:10:30 1.342 --- kern/vfs_bio.c 2002/12/06 21:44:35 *** *** 1114,1119 --- 1114,1137 } /* + * Check to see if a vnode holds too many dirty buffers. If it does, + * flush it. + */ + int + checkdirtybufs(struct vnode *vp) + { + struct buf *bp; + int dirtycnt = 0, error = 0; + struct thread *td = curthread; + + TAILQ_FOREACH(bp, vp-v_dirtyblkhd, b_vnbufs) + dirtycnt++; + if (dirtycnt lodirtybuffers) + error = VOP_FSYNC(vp, td-td_ucred, MNT_NOWAIT, td); + return (error); + } + + /* * Return true if we have too many dirty buffers. */ int Index: ufs/ffs/ffs_balloc.c === RCS file: /usr/ncvs/src/sys/ufs/ffs/ffs_balloc.c,v retrieving revision 1.39 diff -c -r1.39 ffs_balloc.c *** ufs/ffs/ffs_balloc.c2002/10/22 01:14:25 1.39 --- ufs/ffs/ffs_balloc.c2002/12/06 21:49:56 *** *** 295,300 --- 295,301 if (bp-b_bufsize == fs-fs_bsize) bp-b_flags |= B_CLUSTEROK; bdwrite(bp); + checkdirtybufs(vp); } } /* *** *** 335,340 --- 336,342 if (bp-b_bufsize == fs-fs_bsize) bp-b_flags |= B_CLUSTEROK; bdwrite(bp); + checkdirtybufs(vp); } *bpp = nbp; return (0); *** *** 756,761 --- 758,764 if (bp-b_bufsize == fs-fs_bsize) bp-b_flags |= B_CLUSTEROK; bdwrite(bp); + checkdirtybufs(vp); } } /* *** *** 796,801 --- 799,805 if (bp-b_bufsize == fs-fs_bsize) bp-b_flags |= B_CLUSTEROK; bdwrite(bp); + checkdirtybufs(vp); } *bpp = nbp; return (0); To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Julian Elischer wrote: most systems follow / with their swap region.. you can boot from fixit, or picoBSD floppy and use disklabel -e to exend the root partition then you can use growfs to add the new space to your root fs. Hmm.. I tried that and it didn't seem to work. The disklabel change was successful, but growfs didn't seem to expand the root partition any.. df(1) still shows it as 50M. I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. Since that didn't work, I booted a 4.7-REL fixit floppy and tried to run growfs from there, but then that growfs core dumped: Program terminated with signal 11, Segmentation fault. #0 0x804c089 in updclst (block=-874) at growfs.c:2335 2335setbit(cg_clustersfree(acg), block); (gdb) list 2330return; 2331} 2332/* 2333 * update cluster allocation map 2334 */ 2335setbit(cg_clustersfree(acg), block); 2336 (gdb) where #0 0x804c089 in updclst (block=-874) at growfs.c:2335 #1 0x8049584 in updjcg (cylno=2, utime=1039185218, fsi=4, fso=3, Nflag=0) at growfs.c:862 #2 0x8048280 in growfs (fsi=4, fso=3, Nflag=0) at growfs.c:219 #3 0x804beb2 in main (argc=2, argv=0xbfbff7a4) at growfs.c:2213 #4 0x8048135 in _start () Notice block=-874 which indicates something is weird or corrupted. So now I've got extra space in the partition which (apparently) is not being used and I can't seem to get at it (see below). Plus I have a sneaking suspicion that I've screwed up something, but there's nothing in the growfs man page that indicates what I did was wrong. FYI, this is a test machine so it's OK if it gets hosed. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com $ disklabel ad0s1 # /dev/ad0s1c: type: ESDI disk: ad0s1 label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 1860 sectors/unit: 29896902 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: #size offsetfstype [fsize bsize bps/cpg] a: 20480004.2BSD 1024 8192 32768 # (Cyl.0 - 12*) b: 164608 204800 swap# (Cyl. 12*- 22*) c: 298969020unused0 0 # (Cyl.0 - 1860*) e:40960 3694084.2BSD 1024 819216 # (Cyl. 22*- 25*) f: 29486534 4103684.2BSD 1024 819216 # (Cyl. 25*- 1860*) $ df Filesystem 1K-blocksUsedAvail Capacity Mounted on /dev/ad0s1a49583 36751 886681%/ devfs 1 10 100%/dev /dev/ad0s1f 14289643 2794938 1035153421%/usr /dev/ad0s1e1981535551467520%/var procfs 4 40 100%/proc To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
I put a copy of / in /usr then from the fixit, I mounted /usr as / and ran growfs from there.. the trick is to not do it while / is mounted. On Fri, 6 Dec 2002, Archie Cobbs wrote: Julian Elischer wrote: most systems follow / with their swap region.. you can boot from fixit, or picoBSD floppy and use disklabel -e to exend the root partition then you can use growfs to add the new space to your root fs. Hmm.. I tried that and it didn't seem to work. The disklabel change was successful, but growfs didn't seem to expand the root partition any.. df(1) still shows it as 50M. I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. Since that didn't work, I booted a 4.7-REL fixit floppy and tried to run growfs from there, but then that growfs core dumped: To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Julian Elischer wrote: I put a copy of / in /usr then from the fixit, I mounted /usr as / and ran growfs from there.. the trick is to not do it while / is mounted. / wasn't mounted yet when I ran growfs: I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. But it was the root partition and I was running in single user mode. If that's a problem then the growfs man page should say so, or maybe it should be more clear about what is meant by mounted. In any case, running it from the fixit floppy didn't work either (got a core dump), but that may be because it was already screwed up. So at minimum, there's a documentation bug (IMHO). -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: OK, it looks like my hypothesis on having a small number of buffers and running out of them is the problem. I enclose below a patch which should check for the problem arising and help to mitigate it. I would appreciate you dropping it into your kernel and seeing if it solves your problem. The fix is not ideal, but merely to see if it solves this problem. If it does, I will figure out how to do it properly. Thanks for your help. Yep, that fixes it. Now I just get the usual sluggishness while the background fsck runs (which is not too bad), but it eventually finishes and then all is well. Thanks, -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
On Fri, 6 Dec 2002, Archie Cobbs wrote: David Rhodus wrote: Softupdates is enabled on /usr and /var but not /. Why does softupdates not get enabled on / , by default on the install? I disabled softupdates on / back when having it enabled caused disk full problems during 'make installworld,' and never re-enabled it. FYI at this point my 50MB / partition is woefully inadequate. I can't even 'make install kernel' without first removing all existing modules, and even so / ends up 106% full. Not very surprising. With just a couple of kernels around, my current usage on / is way over 50 MB. And I keep my /tmp files on an md(4) fs. gothmog# du -kx / | grep -v '/.*/' | grep '[0-9][0-9]\+' 2700/stand 1628/etc 6814/bin 28004 /boot 2946/root 21118 /sbin 63244 / The largest amount of space is under /boot where exactly 2 kernels are kept now (kernel and kernel.old, just in case an installkernel goes very wrong) but /sbin isn't very small either. Giorgos To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
I suggest that we drag Thomas-Henning von Kamptz into this discussion as he was one of the main authors of growfs. He is copied on my reply. Kirk McKusick =-=-=-=-=-= From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Julian Elischer [EMAIL PROTECTED] Date: Fri, 6 Dec 2002 14:52:24 -0800 (PST) CC: [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Whitelist match Julian Elischer wrote: most systems follow / with their swap region.. you can boot from fixit, or picoBSD floppy and use disklabel -e to exend the root partition then you can use growfs to add the new space to your root fs. Hmm.. I tried that and it didn't seem to work. The disklabel change was successful, but growfs didn't seem to expand the root partition any.. df(1) still shows it as 50M. I ran growfs after booting single user mode but before mounting any disks.. perhaps that caused it to not work. Since that didn't work, I booted a 4.7-REL fixit floppy and tried to run growfs from there, but then that growfs core dumped: Program terminated with signal 11, Segmentation fault. #0 0x804c089 in updclst (block=-874) at growfs.c:2335 2335setbit(cg_clustersfree(acg), block); (gdb) list 2330return; 2331} 2332/* 2333 * update cluster allocation map 2334 */ 2335setbit(cg_clustersfree(acg), block); 2336 (gdb) where #0 0x804c089 in updclst (block=-874) at growfs.c:2335 #1 0x8049584 in updjcg (cylno=2, utime=1039185218, fsi=4, fso=3, Nflag=0) at growfs.c:862 #2 0x8048280 in growfs (fsi=4, fso=3, Nflag=0) at growfs.c:219 #3 0x804beb2 in main (argc=2, argv=0xbfbff7a4) at growfs.c:2213 #4 0x8048135 in _start () Notice block=-874 which indicates something is weird or corrupted. So now I've got extra space in the partition which (apparently) is not being used and I can't seem to get at it (see below). Plus I have a sneaking suspicion that I've screwed up something, but there's nothing in the growfs man page that indicates what I did was wrong. FYI, this is a test machine so it's OK if it gets hosed. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com $ disklabel ad0s1 # /dev/ad0s1c: type: ESDI disk: ad0s1 label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 1860 sectors/unit: 29896902 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: #size offsetfstype [fsize bsize bps/cpg] a: 20480004.2BSD 1024 8192 32768 # (Cyl.0 - 12*) b: 164608 204800 swap# (Cyl. 12*- 22*) c: 298969020unused0 0 # (Cyl.0 - 1860*) e:40960 3694084.2BSD 1024 819216 # (Cyl. 22*- 25*) f: 29486534 4103684.2BSD 1024 819216 # (Cyl. 25*- 1860*) $ df Filesystem 1K-blocksUsedAvail Capacity Mounted on /dev/ad0s1a49583 36751 886681%/ devfs 1 10 100%/dev /dev/ad0s1f 14289643 2794938 1035153421%/usr /dev/ad0s1e1981535551467520%/var procfs 4 40 100%/proc To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Kirk McKusick [EMAIL PROTECTED] Date: Fri, 6 Dec 2002 15:23:36 -0800 (PST) CC: Archie Cobbs [EMAIL PROTECTED], Nate Lawson [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Whitelist match Kirk McKusick wrote: OK, it looks like my hypothesis on having a small number of buffers and running out of them is the problem. I enclose below a patch which should check for the problem arising and help to mitigate it. I would appreciate you dropping it into your kernel and seeing if it solves your problem. The fix is not ideal, but merely to see if it solves this problem. If it does, I will figure out how to do it properly. Thanks for your help. Yep, that fixes it. Now I just get the usual sluggishness while the background fsck runs (which is not too bad), but it eventually finishes and then all is well. Thanks, -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com Thanks for verifying that the idea works. I will attempt to figure out how to do it correctly and submit a proposed fix. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Adding a two minute delay before starting background fsck sounds like a very good idea to me. Please send me your suggested change. Kirk McKusick =-=-=-=-= Date: Fri, 6 Dec 2002 10:44:45 -0800 From: Brooks Davis [EMAIL PROTECTED] To: Nate Lawson [EMAIL PROTECTED] Cc: Kirk McKusick [EMAIL PROTECTED], Archie Cobbs [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) X-ASK-Info: Confirmed by User On Fri, Dec 06, 2002 at 10:27:10AM -0800, Nate Lawson wrote: On Thu, 5 Dec 2002, Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what=20 happens when it eventually finishes? Is the system still dead, or=20 does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). =20 Sorry for butting in. I think Archie is referring to bg fsck gaining an unfair share of cpu due to it running due to IO completions. Last I heard, we were waiting until after 5.0 to experiment with scheduler changes to make it more fair. I have not seen any hard locks or other problems with bg fsck after your commit. My experience is that, at least with my laptop (which has a very slow disk), bg fsck works OK, but starting applictions for the first time while fsck is running is _very_ painful. Even getty seems to have a hard time. I've found that adding a two minute delay before the fsck is sufficent to allow the system to finish starting up and for me to load X and my main applictions which lets me work while bg fsck is running. I posted a patch to add an optional delay in the rc scripts a while ago, but Kirk was going to re-enable the priority stuff soon so I didn't persue it. If there's intrest, I'll regenerate it and repost it. -- Brooks Any statement of the form X is the one, true Y is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: I suggest that we drag Thomas-Henning von Kamptz into this discussion as he was one of the main authors of growfs. He is copied on my reply. Thanks. FYI, I finally fixed things by doing what Julian suggested, which is to copy / to /usr, reboot with /usr mounted as /, newfs /, and then copy everything back. So in summary my recommendation is to add a big warning to the growfs(1) man page that is should not be run on the root partition, even if you have booted single-user mode and haven't mounted / yet. I.e., to grow a root partition, you must boot from a different partition. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Date: Thu, 5 Dec 2002 15:22:27 -0800 (PST) From: Archie Cobbs [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: backgroud fsck is still locking up system Just rebuilt -current this morning. Background fsck is still causing a soft lockup. I thought the conclusion was we were going to disable it for 5.0. Not trying to rush anyone, just pointing out that this still needs to be done.. -Archie __ Archie Cobbs*Packet Design*http://www.packetdesign.com What do you mean by background fsck causing a soft lockup? Is it failing? Is it deadlocking the system? Do you have a specific test case that shows the problem? Needless to say it is working fine on my system and on my regression tests. The only problem that I am having with 5.0 as of last night is getting login to work on my console. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: Just rebuilt -current this morning. Background fsck is still causing a soft lockup. I thought the conclusion was we were going to disable it for 5.0. What do you mean by background fsck causing a soft lockup? Is it failing? Is it deadlocking the system? Do you have a specific test case that shows the problem? Needless to say it is working fine on my system and on my regression tests. The only problem that I am having with 5.0 as of last night is getting login to work on my console. What happens is that at first I can login, but the system seems slow. I then got as far as running 'top' but it never refreshed its display and subsequently all keystrokes were ignored. Changing virtual terminals still works OK, but they are effectively dead too. I'm imagining processes getting stuck on some lock one by one. Top did get as far as showing the background fsck process, which had a priority of -6 or something. The previous time it didn't even spit out a login prompt, but that may just be due to experimental noise. For me, it appears easy to reproduce... 1. Boot -current system 2. Pull the power cable out 3. Put the power cable back in 4. Let the box boot; it notes backgroud fsck 5. Login and try to do something I can give you more details about my system separately if you like. Thanks, -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). Kirk McKusick =-=-=-=-=-= From: Archie Cobbs [EMAIL PROTECTED] Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: [EMAIL PROTECTED] To: Kirk McKusick [EMAIL PROTECTED] Date: Thu, 5 Dec 2002 16:22:20 -0800 (PST) CC: Archie Cobbs [EMAIL PROTECTED], Robert Watson [EMAIL PROTECTED], [EMAIL PROTECTED] X-ASK-Info: Confirmed by User Kirk McKusick wrote: Just rebuilt -current this morning. Background fsck is still causing a soft lockup. I thought the conclusion was we were going to disable it for 5.0. What do you mean by background fsck causing a soft lockup? Is it failing? Is it deadlocking the system? Do you have a specific test case that shows the problem? Needless to say it is working fine on my system and on my regression tests. The only problem that I am having with 5.0 as of last night is getting login to work on my console. What happens is that at first I can login, but the system seems slow. I then got as far as running 'top' but it never refreshed its display and subsequently all keystrokes were ignored. Changing virtual terminals still works OK, but they are effectively dead too. I'm imagining processes getting stuck on some lock one by one. Top did get as far as showing the background fsck process, which had a priority of -6 or something. The previous time it didn't even spit out a login prompt, but that may just be due to experimental noise. For me, it appears easy to reproduce... 1. Boot -current system 2. Pull the power cable out 3. Put the power cable back in 4. Let the box boot; it notes backgroud fsck 5. Login and try to do something I can give you more details about my system separately if you like. Thanks, -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: backgroud fsck is still locking up system (fwd)
Kirk McKusick wrote: Does the background fsck process continue to run, or does the whole system come to a halt? If the fsck process continues to run, what happens when it eventually finishes? Is the system still dead, or does it come back to life? If the system does not come back to life can you get me the output of `ps axl'? If not, can you break into the debugger and get a ps output? (You will need to have the DDB option specified in your config file). I didn't notice whether it was running or not... of course the only way to tell would be to look at the HDD light. I didn't wait more than several minutes so not sure if it would ever finish. I'll try the other stuff tomorrow as I'm away from the office now. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message