Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Mark Johnston
On Sat, Mar 10, 2018 at 05:01:40PM -0500, David Bright wrote:
> With regard to the fsck_ffs behavior being a regression because formerly the 
> FS would be mounted successfully:
> 
> That was not my experience. What I observed was that the “fsck -y” would give 
> the “please re-run” message, exit with 0 status so the boot would continue, 
> the subsequent mount would fail because the filesystem was not clean, and 
> *then* the boot would stop and drop to single user.

I think my problem is specific to SU without journaling. The UFS code
allows one to mount an unclean filesystem in that configuration since SU
guarantees that on-disk metadata is consistent. A background fsck takes
care of leaked inodes and data blocks.

(FWIW, I'm not using journaling only because makefs(8) doesn't support
the creation of SU+J filesystems.)

/dev/gpt/rootfs: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/gpt/rootfs: SUMMARY INFORMATION BAD (SALVAGED)
/dev/gpt/rootfs: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/gpt/rootfs: 32664 files, 495447 used, 813272 free (176 frags, 203274 
blocks, 0.0% fragmentation)

* PLEASE RERUN FSCK *
WARNING: /: reload pending error: blocks 192 files 3
Unknown error 16; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
Mar 10 12:47:50 init: /bin/sh on /etc/rc terminated abnormally, going to single 
user mode
Enter full pathname of shell or RETURN for /bin/sh: 
# mount
/dev/gpt/rootfs on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
# mount -u -o rw /
WARNING: / was not properly dismounted
# echo $?
0
# mount
/dev/gpt/rootfs on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread David Bright
With regard to: fsck_y_flags="-T ffs:-R -T ufs:-R"  # Additional flags for 
fsck -y

I don’t know how, but I completely missed the -T option for fsck when I was 
investigating this issue. That would be very useful, although I wanted my 
solution to be applicable to file systems other than ffs/ufs.


With regard to the fsck_ffs behavior being a regression because formerly the FS 
would be mounted successfully:

That was not my experience. What I observed was that the “fsck -y” would give 
the “please re-run” message, exit with 0 status so the boot would continue, the 
subsequent mount would fail because the filesystem was not clean, and *then* 
the boot would stop and drop to single user.


-- 
David Bright
d...@freebsd.org



___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Cy Schubert
In message <201803101751.w2ahphph070...@pdx.rh.cn85.dnsmgr.net>, 
"Rodney W. Gri
mes" writes:
> > On Sat, Mar 10, 2018 at 10:26 AM, Ian Lepore  wrote:
> > 
> > > On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > > > >
> > > > > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > > > >
> > > [...]
> > > > > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > > > > Please do not do that, if fsck -p fails YOU may optionally
> > > > > > wish to continue, or do retries, but please do not make this
> > > > > > a hardcoded situation.??At most make it a controllable knob
> > > > > > that defaults to the old behavior please.
> > > > > >
> > > > > > Thanks you,
> > > > > This whole situation with fsck retries is just very strange. ?How
> > > > > many other tools in the base system exhibit this behavior:?
> > > > >
> > > > > I didn't do everything you asked, even though I am completely
> > > > > capable of doing so. ?If you'd like to actually do the thing
> > > > > you asked for, please run this program again.
> > > > >
> > > > > If there is some reason why fsck should do less than a complete job
> > > > > under some circumstances, isn't THAT the exceptional situation that
> > > > > should need a special flag to make it happen?
> > > > The job is "make sure my data is ok, keep my data at all costs, do
> > > > not however do something that may damange my data".
> > > >
> > > > The job is NOT "do everything you can to bring the file system to
> > > > a consistent state, even if you have to screw my data all up".
> > > >
> > >
> > > I'm not sure why you think the -R flag is some sort of "ruin my data"
> > > request.  Maybe because all of this stuff is so scantily documented in
> > > the manpage?
> > >
> > > -R Instruct fsck_ffs to restart itself if it encounters certain
> > >  errors that warrant another run.
> > >
> > > Who knows what "certain errors" means?
> > >
> > 
> > There are some classes of errors that fsck correct that it must recompute a
> > large amount of state to make sure it is consistent. Rather than doing
> > that, it exits with a message saying to re-run fsck to make sure that there
> > aren't more errors that were hidden by the now-corrected errors from the
> > past pass.
> > 
> > 
> > > Looking at the code, it appears -R has no effect if you're in preen
> > > mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
> > > man page, you'll just find a bunch of mentions of the word preen that
> > > say "see the -p flag" and then, surrealistically, when you look at the
> > > -p flag it says "Preen file systems (see above)".  Of course, what was
> > > above was all the places that told you to see -p.
> > >
> > 
> > The man page could use some improvement. Preen mode means 'fix all the
> > stupid inconsistencies that crop up that never result in data loss'.
> > non-preen mode means to do that, and ask if you want to correct other
> > errors that usually don't cause data loss, but might and some modicum of
> > human intelligence is required to tell the two apart. Eg, I usually give up
> > hitting 'y' after a dozen or so times in FSCK unless I have a specific
> > reason to keep going. fsck -y has no such nuance.
>
> I do not believe that normal mode has any intellegnce to as if data
> loss will or will not occur.  It will gladly ask you if you want to
> clear an inode that is the root of a rather large tree, and you end
> up with either data loss, or a huge lost+found, sometimes even over
> flowing the size of lost+found (though that may of been fixed in ufs2).
>
> It simply runs along and if it finds an error it asks if you want
> to correct it or not.  Y is not always the correct answer, but
> most people are oblivious to what the questions imply with respect
> to the file system, and hence answer Y.  fsck does do thing in
> a sequence that tries to make Y the correct answer, but as you
> say human intelligence may do better.
>
> Some times if you had answered N at the right question you would not
> of gotten all of the other 11 questions that lead you to giving up,
> sometimes the N answer maybe 100's of Y's in, often to a clear
> inode question.
>
> When I get a preen failure my usual next step is to run a logged
> fsck -n to see what that says so I can evaluate the extent of fs
> damage, especially if this is a critical file system containing
> very valuable data.  
>
> > Warner
> > 
> > 
> > > So, I guess I'll just keep using fsck_y_enable=YES and relying on the
> > > fact that by default that now includes the -R option.
>
> And if your running ufs2 with soft updates your in a
> pretty safe place.  I would not recommend doing this on ufs1
> or without soft updates enabled.
>
> One must try to remeber that fsck -p during /etc/rc processing can
> run into many different file systems, some more resilent to running
> things like fsck -R -y, some not.

Having been in this situation with FreeBSD, Solaris, Linux, 

Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Cy Schubert
In message <1520702802.84937.126.ca...@freebsd.org>, Ian Lepore writes:
> On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > > 
> > > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > > 
> [...]
> > > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > > Please do not do that, if fsck -p fails YOU may optionally
> > > > wish to continue, or do retries, but please do not make this
> > > > a hardcoded situation.??At most make it a controllable knob
> > > > that defaults to the old behavior please.
> > > > 
> > > > Thanks you,
> > > This whole situation with fsck retries is just very strange. ?How
> > > many other tools in the base system exhibit this behavior:?
> > > 
> > >     I didn't do everything you asked, even though I am completely
> > >     capable of doing so. ?If you'd like to actually do the thing 
> > >     you asked for, please run this program again.
> > > 
> > > If there is some reason why fsck should do less than a complete job
> > > under some circumstances, isn't THAT the exceptional situation that
> > > should need a special flag to make it happen?
> > The job is "make sure my data is ok, keep my data at all costs, do
> > not however do something that may damange my data".
> > 
> > The job is NOT "do everything you can to bring the file system to
> > a consistent state, even if you have to screw my data all up".
> > 
>
> I'm not sure why you think the -R flag is some sort of "ruin my data"
> request.  Maybe because all of this stuff is so scantily documented in
> the manpage?
>
> -R Instruct fsck_ffs to restart itself if it encounters certain  
>  errors that warrant another run.
>
> Who knows what "certain errors" means?  
>
> Looking at the code, it appears -R has no effect if you're in preen
> mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
> man page, you'll just find a bunch of mentions of the word preen that
> say "see the -p flag" and then, surrealistically, when you look at the
> -p flag it says "Preen file systems (see above)".  Of course, what was
> above was all the places that told you to see -p.
>
> So, I guess I'll just keep using fsck_y_enable=YES and relying on the
> fact that by default that now includes the -R option.

That's how I've set up my firewall/gateway. For it I'm much more 
concerned to have it successfully boot than data loss. The reason is if 
I'm remote I want to be able to ssh back in. So, I'm willing to take 
the risk to be able to do so.

Having said that, I maintain backup slices on an alternate disk in case 
of loss should the primary slice fail to boot. In that case data loss 
is tolerable to allow a better chance I can remotely ssh in. (Of course 
there's no 100% guarantee if there's data loss but it's better than 0% 
if the gateway dropped into single user state from the get-go.)

With my other gear using UFS I want a failing fsck to fall to single 
user as I can get in using a console server to examine the damage 
decide for myself.

Long story short, it depends.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Rodney W. Grimes
> On Sat, Mar 10, 2018 at 10:26 AM, Ian Lepore  wrote:
> 
> > On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > > >
> > > > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > > >
> > [...]
> > > > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > > > Please do not do that, if fsck -p fails YOU may optionally
> > > > > wish to continue, or do retries, but please do not make this
> > > > > a hardcoded situation.??At most make it a controllable knob
> > > > > that defaults to the old behavior please.
> > > > >
> > > > > Thanks you,
> > > > This whole situation with fsck retries is just very strange. ?How
> > > > many other tools in the base system exhibit this behavior:?
> > > >
> > > > I didn't do everything you asked, even though I am completely
> > > > capable of doing so. ?If you'd like to actually do the thing
> > > > you asked for, please run this program again.
> > > >
> > > > If there is some reason why fsck should do less than a complete job
> > > > under some circumstances, isn't THAT the exceptional situation that
> > > > should need a special flag to make it happen?
> > > The job is "make sure my data is ok, keep my data at all costs, do
> > > not however do something that may damange my data".
> > >
> > > The job is NOT "do everything you can to bring the file system to
> > > a consistent state, even if you have to screw my data all up".
> > >
> >
> > I'm not sure why you think the -R flag is some sort of "ruin my data"
> > request.  Maybe because all of this stuff is so scantily documented in
> > the manpage?
> >
> > -R Instruct fsck_ffs to restart itself if it encounters certain
> >  errors that warrant another run.
> >
> > Who knows what "certain errors" means?
> >
> 
> There are some classes of errors that fsck correct that it must recompute a
> large amount of state to make sure it is consistent. Rather than doing
> that, it exits with a message saying to re-run fsck to make sure that there
> aren't more errors that were hidden by the now-corrected errors from the
> past pass.
> 
> 
> > Looking at the code, it appears -R has no effect if you're in preen
> > mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
> > man page, you'll just find a bunch of mentions of the word preen that
> > say "see the -p flag" and then, surrealistically, when you look at the
> > -p flag it says "Preen file systems (see above)".  Of course, what was
> > above was all the places that told you to see -p.
> >
> 
> The man page could use some improvement. Preen mode means 'fix all the
> stupid inconsistencies that crop up that never result in data loss'.
> non-preen mode means to do that, and ask if you want to correct other
> errors that usually don't cause data loss, but might and some modicum of
> human intelligence is required to tell the two apart. Eg, I usually give up
> hitting 'y' after a dozen or so times in FSCK unless I have a specific
> reason to keep going. fsck -y has no such nuance.

I do not believe that normal mode has any intellegnce to as if data
loss will or will not occur.  It will gladly ask you if you want to
clear an inode that is the root of a rather large tree, and you end
up with either data loss, or a huge lost+found, sometimes even over
flowing the size of lost+found (though that may of been fixed in ufs2).

It simply runs along and if it finds an error it asks if you want
to correct it or not.  Y is not always the correct answer, but
most people are oblivious to what the questions imply with respect
to the file system, and hence answer Y.  fsck does do thing in
a sequence that tries to make Y the correct answer, but as you
say human intelligence may do better.

Some times if you had answered N at the right question you would not
of gotten all of the other 11 questions that lead you to giving up,
sometimes the N answer maybe 100's of Y's in, often to a clear
inode question.

When I get a preen failure my usual next step is to run a logged
fsck -n to see what that says so I can evaluate the extent of fs
damage, especially if this is a critical file system containing
very valuable data.  

> Warner
> 
> 
> > So, I guess I'll just keep using fsck_y_enable=YES and relying on the
> > fact that by default that now includes the -R option.

And if your running ufs2 with soft updates your in a
pretty safe place.  I would not recommend doing this on ufs1
or without soft updates enabled.

One must try to remeber that fsck -p during /etc/rc processing can
run into many different file systems, some more resilent to running
things like fsck -R -y, some not.

-- 
Rod Grimes rgri...@freebsd.org
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Warner Losh
On Sat, Mar 10, 2018 at 10:26 AM, Ian Lepore  wrote:

> On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > >
> > > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > >
> [...]
> > > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > > Please do not do that, if fsck -p fails YOU may optionally
> > > > wish to continue, or do retries, but please do not make this
> > > > a hardcoded situation.??At most make it a controllable knob
> > > > that defaults to the old behavior please.
> > > >
> > > > Thanks you,
> > > This whole situation with fsck retries is just very strange. ?How
> > > many other tools in the base system exhibit this behavior:?
> > >
> > > I didn't do everything you asked, even though I am completely
> > > capable of doing so. ?If you'd like to actually do the thing
> > > you asked for, please run this program again.
> > >
> > > If there is some reason why fsck should do less than a complete job
> > > under some circumstances, isn't THAT the exceptional situation that
> > > should need a special flag to make it happen?
> > The job is "make sure my data is ok, keep my data at all costs, do
> > not however do something that may damange my data".
> >
> > The job is NOT "do everything you can to bring the file system to
> > a consistent state, even if you have to screw my data all up".
> >
>
> I'm not sure why you think the -R flag is some sort of "ruin my data"
> request.  Maybe because all of this stuff is so scantily documented in
> the manpage?
>
> -R Instruct fsck_ffs to restart itself if it encounters certain
>  errors that warrant another run.
>
> Who knows what "certain errors" means?
>

There are some classes of errors that fsck correct that it must recompute a
large amount of state to make sure it is consistent. Rather than doing
that, it exits with a message saying to re-run fsck to make sure that there
aren't more errors that were hidden by the now-corrected errors from the
past pass.


> Looking at the code, it appears -R has no effect if you're in preen
> mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
> man page, you'll just find a bunch of mentions of the word preen that
> say "see the -p flag" and then, surrealistically, when you look at the
> -p flag it says "Preen file systems (see above)".  Of course, what was
> above was all the places that told you to see -p.
>

The man page could use some improvement. Preen mode means 'fix all the
stupid inconsistencies that crop up that never result in data loss'.
non-preen mode means to do that, and ask if you want to correct other
errors that usually don't cause data loss, but might and some modicum of
human intelligence is required to tell the two apart. Eg, I usually give up
hitting 'y' after a dozen or so times in FSCK unless I have a specific
reason to keep going. fsck -y has no such nuance.

Warner


> So, I guess I'll just keep using fsck_y_enable=YES and relying on the
> fact that by default that now includes the -R option.
>
> -- Ian
>
>
>
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > 
> > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > 
[...]
> > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > Please do not do that, if fsck -p fails YOU may optionally
> > > wish to continue, or do retries, but please do not make this
> > > a hardcoded situation.??At most make it a controllable knob
> > > that defaults to the old behavior please.
> > > 
> > > Thanks you,
> > This whole situation with fsck retries is just very strange. ?How
> > many other tools in the base system exhibit this behavior:?
> > 
> > I didn't do everything you asked, even though I am completely
> > capable of doing so. ?If you'd like to actually do the thing 
> >     you asked for, please run this program again.
> > 
> > If there is some reason why fsck should do less than a complete job
> > under some circumstances, isn't THAT the exceptional situation that
> > should need a special flag to make it happen?
> The job is "make sure my data is ok, keep my data at all costs, do
> not however do something that may damange my data".
> 
> The job is NOT "do everything you can to bring the file system to
> a consistent state, even if you have to screw my data all up".
> 

I'm not sure why you think the -R flag is some sort of "ruin my data"
request.  Maybe because all of this stuff is so scantily documented in
the manpage?

-R Instruct fsck_ffs to restart itself if it encounters certain  
 errors that warrant another run.

Who knows what "certain errors" means?  

Looking at the code, it appears -R has no effect if you're in preen
mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
man page, you'll just find a bunch of mentions of the word preen that
say "see the -p flag" and then, surrealistically, when you look at the
-p flag it says "Preen file systems (see above)".  Of course, what was
above was all the places that told you to see -p.

So, I guess I'll just keep using fsck_y_enable=YES and relying on the
fact that by default that now includes the -R option.

-- Ian

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Rodney W. Grimes
> On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > [ Charset UTF-8 unsupported, converting... ]
> > > 
> > > On Fri, Mar 09, 2018 at 09:36:25PM -0500, David Bright wrote:
> > > > 
> > > > On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > > > > 
> > > > > 
> > > > > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> > > > > > 
> > > > > > 
> > > > > > etc/rc.d/fsck doesn't know how to interpret the new exit code and 
> > > > > > now
> > > > > > just drops to a single-user shell when it is encountered. [?]
> > > > > > 
> > > > > > Is there any reason etc/rc.d/fsck shouldn't automatically retry (up 
> > > > > > to
> > > > This is, in fact, the reason that I made the change I did. I was trying 
> > > > to put in a retry loop to rc.d/fsck, but found that I couldn?t get it 
> > > > to work because fsck and fsck_ffs were not exiting with non-zero 
> > > > status. The drop to single user is not really due to the specific (new) 
> > > > error code of 16, it is due to the fact that fsck_ffs is now exiting 
> > > > with a non-zero status when it hasn?t completely cleaned the file 
> > > > system;
> > > Sure, but that's a regression IMO: before, I believe we'd successfully
> > > mount the FS even without retrying fsck, and continue booting.
> > > 
> > > > 
> > > > /any/ non-zero status would cause the current rc.d/fsck script to go to 
> > > > single user. Prior to my change, fsck_ffs was exiting with a zero 
> > > > status even though it had not completely cleaned the filesystem and 
> > > > told the user to run it again.
> > > > 
> > > > > 
> > > > > 
> > > > > fsck_ffs already has a -R flag to automatically retry, wouldn't that 
> > > > > be
> > > > > a better mechanism for handling this new type of retry?
> > > > That?s true; however, there is currently no way to pass that flag 
> > > > through the filesystem-agnostic fsck wrapper called from rc.d/fsck to 
> > > > the filesystem-specific fsck_ffs program that it calls. One could 
> > > > implement a similar flag on the fsck wrapper to be passed along to the 
> > > > filesystem-specific checker, but I think fsck_ffs is the only one that 
> > > > currently implements such a flag.?
> > > As was pointed out by others, this isn't true. In my experience it's
> > > fsck -p that is exiting with status 16. It thus seems like it would be
> > > desirable to add "-T ffs:-R" to the initial fsck invocation in
> > > rc.d/fsck.
> > Please do not do that, if fsck -p fails YOU may optionally
> > wish to continue, or do retries, but please do not make this
> > a hardcoded situation.??At most make it a controllable knob
> > that defaults to the old behavior please.
> > 
> > Thanks you,
> 
> This whole situation with fsck retries is just very strange. ?How many
> other tools in the base system exhibit this behavior:?
> 
> I didn't do everything you asked, even though I am completely
> capable of doing so. ?If you'd like to actually do the thing you
> asked for, please run this program again.
> 
> If there is some reason why fsck should do less than a complete job
> under some circumstances, isn't THAT the exceptional situation that
> should need a special flag to make it happen?

The job is "make sure my data is ok, keep my data at all costs, do
not however do something that may damange my data".

The job is NOT "do everything you can to bring the file system to
a consistent state, even if you have to screw my data all up".

-- 
Rod Grimes rgri...@freebsd.org
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Ian Lepore
On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> [ Charset UTF-8 unsupported, converting... ]
> > 
> > On Fri, Mar 09, 2018 at 09:36:25PM -0500, David Bright wrote:
> > > 
> > > On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > > > 
> > > > 
> > > > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> > > > > 
> > > > > 
> > > > > etc/rc.d/fsck doesn't know how to interpret the new exit code and now
> > > > > just drops to a single-user shell when it is encountered. [?]
> > > > > 
> > > > > Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
> > > This is, in fact, the reason that I made the change I did. I was trying 
> > > to put in a retry loop to rc.d/fsck, but found that I couldn?t get it to 
> > > work because fsck and fsck_ffs were not exiting with non-zero status. The 
> > > drop to single user is not really due to the specific (new) error code of 
> > > 16, it is due to the fact that fsck_ffs is now exiting with a non-zero 
> > > status when it hasn?t completely cleaned the file system;
> > Sure, but that's a regression IMO: before, I believe we'd successfully
> > mount the FS even without retrying fsck, and continue booting.
> > 
> > > 
> > > /any/ non-zero status would cause the current rc.d/fsck script to go to 
> > > single user. Prior to my change, fsck_ffs was exiting with a zero status 
> > > even though it had not completely cleaned the filesystem and told the 
> > > user to run it again.
> > > 
> > > > 
> > > > 
> > > > fsck_ffs already has a -R flag to automatically retry, wouldn't that be
> > > > a better mechanism for handling this new type of retry?
> > > That?s true; however, there is currently no way to pass that flag through 
> > > the filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
> > > filesystem-specific fsck_ffs program that it calls. One could implement a 
> > > similar flag on the fsck wrapper to be passed along to the 
> > > filesystem-specific checker, but I think fsck_ffs is the only one that 
> > > currently implements such a flag. 
> > As was pointed out by others, this isn't true. In my experience it's
> > fsck -p that is exiting with status 16. It thus seems like it would be
> > desirable to add "-T ffs:-R" to the initial fsck invocation in
> > rc.d/fsck.
> Please do not do that, if fsck -p fails YOU may optionally
> wish to continue, or do retries, but please do not make this
> a hardcoded situation.  At most make it a controllable knob
> that defaults to the old behavior please.
> 
> Thanks you,

This whole situation with fsck retries is just very strange.  How many
other tools in the base system exhibit this behavior: 

I didn't do everything you asked, even though I am completely
capable of doing so.  If you'd like to actually do the thing you
asked for, please run this program again.

If there is some reason why fsck should do less than a complete job
under some circumstances, isn't THAT the exceptional situation that
should need a special flag to make it happen?

-- Ian
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Rodney W. Grimes
[ Charset UTF-8 unsupported, converting... ]
> On Fri, Mar 09, 2018 at 09:36:25PM -0500, David Bright wrote:
> > On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > > 
> > > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> > >> 
> > >> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
> > >> just drops to a single-user shell when it is encountered. [?]
> > >> 
> > >> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
> > 
> > This is, in fact, the reason that I made the change I did. I was trying to 
> > put in a retry loop to rc.d/fsck, but found that I couldn?t get it to work 
> > because fsck and fsck_ffs were not exiting with non-zero status. The drop 
> > to single user is not really due to the specific (new) error code of 16, it 
> > is due to the fact that fsck_ffs is now exiting with a non-zero status when 
> > it hasn?t completely cleaned the file system;
> 
> Sure, but that's a regression IMO: before, I believe we'd successfully
> mount the FS even without retrying fsck, and continue booting.
> 
> > /any/ non-zero status would cause the current rc.d/fsck script to go to 
> > single user. Prior to my change, fsck_ffs was exiting with a zero status 
> > even though it had not completely cleaned the filesystem and told the user 
> > to run it again.
> > 
> > > 
> > > fsck_ffs already has a -R flag to automatically retry, wouldn't that be
> > > a better mechanism for handling this new type of retry?
> > 
> > That?s true; however, there is currently no way to pass that flag through 
> > the filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
> > filesystem-specific fsck_ffs program that it calls. One could implement a 
> > similar flag on the fsck wrapper to be passed along to the 
> > filesystem-specific checker, but I think fsck_ffs is the only one that 
> > currently implements such a flag. 
> 
> As was pointed out by others, this isn't true. In my experience it's
> fsck -p that is exiting with status 16. It thus seems like it would be
> desirable to add "-T ffs:-R" to the initial fsck invocation in
> rc.d/fsck.

Please do not do that, if fsck -p fails YOU may optionally
wish to continue, or do retries, but please do not make this
a hardcoded situation.  At most make it a controllable knob
that defaults to the old behavior please.

Thanks you,
-- 
Rod Grimes rgri...@freebsd.org
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Mark Johnston
On Fri, Mar 09, 2018 at 09:36:25PM -0500, David Bright wrote:
> On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > 
> > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> >> 
> >> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
> >> just drops to a single-user shell when it is encountered. […]
> >> 
> >> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
> 
> This is, in fact, the reason that I made the change I did. I was trying to 
> put in a retry loop to rc.d/fsck, but found that I couldn’t get it to work 
> because fsck and fsck_ffs were not exiting with non-zero status. The drop to 
> single user is not really due to the specific (new) error code of 16, it is 
> due to the fact that fsck_ffs is now exiting with a non-zero status when it 
> hasn’t completely cleaned the file system;

Sure, but that's a regression IMO: before, I believe we'd successfully
mount the FS even without retrying fsck, and continue booting.

> /any/ non-zero status would cause the current rc.d/fsck script to go to 
> single user. Prior to my change, fsck_ffs was exiting with a zero status even 
> though it had not completely cleaned the filesystem and told the user to run 
> it again.
> 
> > 
> > fsck_ffs already has a -R flag to automatically retry, wouldn't that be
> > a better mechanism for handling this new type of retry?
> 
> That’s true; however, there is currently no way to pass that flag through the 
> filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
> filesystem-specific fsck_ffs program that it calls. One could implement a 
> similar flag on the fsck wrapper to be passed along to the 
> filesystem-specific checker, but I think fsck_ffs is the only one that 
> currently implements such a flag. 

As was pointed out by others, this isn't true. In my experience it's
fsck -p that is exiting with status 16. It thus seems like it would be
desirable to add "-T ffs:-R" to the initial fsck invocation in
rc.d/fsck.
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Ian Lepore
On Fri, 2018-03-09 at 21:36 -0500, David Bright wrote:
> On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > 
> > 
> > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> > > 
> > > 
> > > etc/rc.d/fsck doesn't know how to interpret the new exit code and
> > > now
> > > just drops to a single-user shell when it is encountered. […]
> > > 
> > > Is there any reason etc/rc.d/fsck shouldn't automatically retry
> > > (up to
> This is, in fact, the reason that I made the change I did. I was
> trying to put in a retry loop to rc.d/fsck, but found that I couldn’t
> get it to work because fsck and fsck_ffs were not exiting with non-
> zero status. The drop to single user is not really due to the
> specific (new) error code of 16, it is due to the fact that fsck_ffs
> is now exiting with a non-zero status when it hasn’t completely
> cleaned the file system; /any/ non-zero status would cause the
> current rc.d/fsck script to go to single user. Prior to my change,
> fsck_ffs was exiting with a zero status even though it had not
> completely cleaned the filesystem and told the user to run it again.
> 
> > 
> > 
> > fsck_ffs already has a -R flag to automatically retry, wouldn't
> > that be
> > a better mechanism for handling this new type of retry?
> That’s true; however, there is currently no way to pass that flag
> through the filesystem-agnostic fsck wrapper called from rc.d/fsck to
> the filesystem-specific fsck_ffs program that it calls. One could
> implement a similar flag on the fsck wrapper to be passed along to
> the filesystem-specific checker, but I think fsck_ffs is the only one
> that currently implements such a flag. 
> 
> 

fsck -T ffs:-R

-- Ian

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-10 Thread Edward Tomasz Napierała
On 0309T2136, David Bright wrote:
> On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> > 
> > On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> >> 
> >> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
> >> just drops to a single-user shell when it is encountered. […]
> >> 
> >> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
> 
> This is, in fact, the reason that I made the change I did. I was trying to 
> put in a retry loop to rc.d/fsck, but found that I couldn’t get it to work 
> because fsck and fsck_ffs were not exiting with non-zero status. The drop to 
> single user is not really due to the specific (new) error code of 16, it is 
> due to the fact that fsck_ffs is now exiting with a non-zero status when it 
> hasn’t completely cleaned the file system; /any/ non-zero status would cause 
> the current rc.d/fsck script to go to single user. Prior to my change, 
> fsck_ffs was exiting with a zero status even though it had not completely 
> cleaned the filesystem and told the user to run it again.
> 
> > 
> > fsck_ffs already has a -R flag to automatically retry, wouldn't that be
> > a better mechanism for handling this new type of retry?
> 
> That’s true; however, there is currently no way to pass that flag through the 
> filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
> filesystem-specific fsck_ffs program that it calls. One could implement a 
> similar flag on the fsck wrapper to be passed along to the 
> filesystem-specific checker, but I think fsck_ffs is the only one that 
> currently implements such a flag. 

Sure there is.  See /etc/defaults/rc.conf:

fsck_y_flags="-T ffs:-R -T ufs:-R"  # Additional flags for fsck -y

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-09 Thread Eitan Adler
On 9 March 2018 at 18:36, David Bright  wrote:
> On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
>>
>> On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
>>>
>>> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
>>> just drops to a single-user shell when it is encountered. […]
>>>
>>> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
>
> This is, in fact, the reason that I made the change I did. I was trying to 
> put in a retry loop to rc.d/fsck, but found that I couldn’t get it to work 
> because fsck and fsck_ffs were not exiting with non-zero status. The drop to 
> single user is not really due to the specific (new) error code of 16, it is 
> due to the fact that fsck_ffs is now exiting with a non-zero status when it 
> hasn’t completely cleaned the file system; /any/ non-zero status would cause 
> the current rc.d/fsck script to go to single user. Prior to my change, 
> fsck_ffs was exiting with a zero status even though it had not completely 
> cleaned the filesystem and told the user to run it again.
>
>>
>> fsck_ffs already has a -R flag to automatically retry, wouldn't that be
>> a better mechanism for handling this new type of retry?
>
> That’s true; however, there is currently no way to pass that flag through the 
> filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
> filesystem-specific fsck_ffs program that it calls. One could implement a 
> similar flag on the fsck wrapper to be passed along to the 
> filesystem-specific checker, but I think fsck_ffs is the only one that 
> currently implements such a flag.

Why does it need to be filesystem specific? Can't the retry happen in
the wrapper itself?


-- 
Eitan Adler
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-09 Thread David Bright
On Mar 9, 2018, at 17:31, Ian Lepore  wrote:
> 
> On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
>> 
>> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
>> just drops to a single-user shell when it is encountered. […]
>> 
>> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to

This is, in fact, the reason that I made the change I did. I was trying to put 
in a retry loop to rc.d/fsck, but found that I couldn’t get it to work because 
fsck and fsck_ffs were not exiting with non-zero status. The drop to single 
user is not really due to the specific (new) error code of 16, it is due to the 
fact that fsck_ffs is now exiting with a non-zero status when it hasn’t 
completely cleaned the file system; /any/ non-zero status would cause the 
current rc.d/fsck script to go to single user. Prior to my change, fsck_ffs was 
exiting with a zero status even though it had not completely cleaned the 
filesystem and told the user to run it again.

> 
> fsck_ffs already has a -R flag to automatically retry, wouldn't that be
> a better mechanism for handling this new type of retry?

That’s true; however, there is currently no way to pass that flag through the 
filesystem-agnostic fsck wrapper called from rc.d/fsck to the 
filesystem-specific fsck_ffs program that it calls. One could implement a 
similar flag on the fsck wrapper to be passed along to the filesystem-specific 
checker, but I think fsck_ffs is the only one that currently implements such a 
flag. 


-- 
David Bright
d...@freebsd.org



___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-09 Thread Ian Lepore
On Fri, 2018-03-09 at 17:09 -0500, Mark Johnston wrote:
> On Mon, Jan 15, 2018 at 07:25:11PM +, David Bright wrote:
> > 
> > Author: dab
> > Date: Mon Jan 15 19:25:11 2018
> > New Revision: 328013
> > URL: https://svnweb.freebsd.org/changeset/base/328013
> > 
> > Log:
> >   Exit fsck_ffs with non-zero status when file system is not repaired.
> >   
> > [...]
> etc/rc.d/fsck doesn't know how to interpret the new exit code and now
> just drops to a single-user shell when it is encountered. This is
> happening to me semi-regularly when my test systems crash, especially
> when I test kernel panic handling. :)
> 
> Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
> some configurable number of retries) when the new error code is seen?
> The patch below seems to do the trick for me:
> 

fsck_ffs already has a -R flag to automatically retry, wouldn't that be
a better mechanism for handling this new type of retry?

-- Ian

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r328013 - head/sbin/fsck_ffs

2018-03-09 Thread Mark Johnston
On Mon, Jan 15, 2018 at 07:25:11PM +, David Bright wrote:
> Author: dab
> Date: Mon Jan 15 19:25:11 2018
> New Revision: 328013
> URL: https://svnweb.freebsd.org/changeset/base/328013
> 
> Log:
>   Exit fsck_ffs with non-zero status when file system is not repaired.
>   
>   When the fsck_ffs program cannot fully repair a file system, it will
>   output the message PLEASE RERUN FSCK. However, it does not exit with a
>   non-zero status in this case (contradicting the man page claim that it
>   "exits with 0 on success, and >0 if an error occurs."  The fsck
>   rc-script (when running "fsck -y") tests the status from fsck (which
>   passes along the exit status from fsck_ffs) and issues a "stop_boot"
>   if the status fails. However, this is not effective since fsck_ffs can
>   return zero even on (some) errors. Effectively, it is left to a later
>   step in the boot process when the file systems are mounted to detect
>   the still-unclean file system and stop the boot.
>   
>   This change modifies fsck_ffs so that when it cannot fully repair the
>   file system and issues the PLEASE RERUN FSCK message it also exits
>   with a non-zero status.
>   
>   While here, the fsck_ffs man page has also been updated to document
>   the failing exit status codes used by fsck_ffs. Previously, only exit
>   status 7 was documented. Some of these exit statuses are tested for in
>   the fsck rc-script, so they are clearly depended upon and deserve
>   documentation.

etc/rc.d/fsck doesn't know how to interpret the new exit code and now
just drops to a single-user shell when it is encountered. This is
happening to me semi-regularly when my test systems crash, especially
when I test kernel panic handling. :)

Is there any reason etc/rc.d/fsck shouldn't automatically retry (up to
some configurable number of retries) when the new error code is seen?
The patch below seems to do the trick for me:

diff --git a/etc/defaults/rc.conf b/etc/defaults/rc.conf
index 584e842bba2c..63d2fcc0be8d 100644
--- a/etc/defaults/rc.conf
+++ b/etc/defaults/rc.conf
@@ -95,6 +95,7 @@ root_rw_mount="YES"   # Set to NO to inhibit remounting root 
read-write.
 root_hold_delay="30"   # Time to wait for root mount hold release.
 fsck_y_enable="NO" # Set to YES to do fsck -y if the initial preen fails.
 fsck_y_flags="-T ffs:-R -T ufs:-R" # Additional flags for fsck -y
+fsck_retries="3"# Number of times to retry fsck before giving up.
 background_fsck="YES"  # Attempt to run fsck in the background where possible.
 background_fsck_delay="60" # Time to wait (seconds) before starting the fsck.
 growfs_enable="NO" # Set to YES to attempt to grow the root filesystem on 
boot
diff --git a/etc/rc.d/fsck b/etc/rc.d/fsck
index bd3122a20110..708d92228e3d 100755
--- a/etc/rc.d/fsck
+++ b/etc/rc.d/fsck
@@ -14,8 +14,82 @@ desc="Run file system checks"
 start_cmd="fsck_start"
 stop_cmd=":"
 
+_fsck_run()
+{
+   local err
+
+   if checkyesno background_fsck; then
+   fsck -F -p
+   else
+   fsck -p
+   fi
+
+   err=$?
+   if [ ${err} -eq 3 ]; then
+   echo "Warning! Some of the devices might not be" \
+   "available; retrying"
+   root_hold_wait
+   check_startmsgs && echo "Restarting file system checks:"
+   if checkyesno background_fsck; then
+   fsck -F -p
+   else
+   fsck -p
+   fi
+   err=$?
+   fi
+
+   case ${err} in
+   0)
+   ;;
+   2)
+   stop_boot
+   ;;
+   4)
+   echo "Rebooting..."
+   reboot
+   echo "Reboot failed; help!"
+   stop_boot
+   ;;
+   8)
+   if checkyesno fsck_y_enable; then
+   echo "File system preen failed, trying fsck -y 
${fsck_y_flags}"
+   fsck -y ${fsck_y_flags}
+   case $? in
+   0)
+   ;;
+   *)
+   echo "Automatic file system check failed; help!"
+   stop_boot
+   ;;
+   esac
+   else
+   echo "Automatic file system check failed; help!"
+   stop_boot
+   fi
+   ;;
+   12)
+   echo "Boot interrupted."
+   stop_boot
+   ;;
+   16)
+   echo "File system check retry requested."
+   ;;
+   130)
+   stop_boot
+   ;;
+   *)
+   echo "Unknown error ${err}; help!"
+   stop_boot
+   ;;
+   esac
+
+   return $err
+}
+
 fsck_start()
 {
+   local err tries
+
if [ "$autoboot" = no ]; then
echo "Fast boot: skipping disk checks."
elif