Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-10-13 Thread Kirk McKusick
> Date: Wed, 7 Aug 2019 10:37:29 +0200
> From: "O. Hartmann" 
> To: freebsd-current 
> Subject: CURRENT: supeblock hash failure - CURRENT wrecking disks
> 
> Hello,
> 
> Today I ran into a ctastrophy with r350671. After installing a fresh
> compiled system and rebooted the box, UEFI loader dropped a bunch
> of errors, like some hex numbers stating, that a hash/superblock
> has is wrong and then the booting stopped at the OK loader prompt.
> 
> Rebooting the machine with the FreeBSD-13-CURRENT image from 1st
> August 2019 and trying to fsck the filesystem(s) on the boot SSD
> (UFS2, journaling and trim on), lots of unresolved block errors
> occured. But that didn't help much.  Further, after several checks,
> I saw some commits to the ffs code recently adn tried to restore a
> copy of the superblock of each filesystem (in contrary to the man
> page for fsck_ufs, the first backup superblock resides in 192, not
> 160!). But things then get even worse, it seems the whole /boot
> structure is corrupted, the loader can not find the recent kernel
> and kernel.old is crashing.
> 
> What's wrong here :-(
> 
> The box in question has been setup 6 weeks ago with FreeBSD 13-CURRENT
> natively. It is now a wreck. Other systems running CURRENT (as of
> the most recent revision as of today) were partially installed as
> 12-STABLE/12-CURRENT and "moved on" to what is now 13-CURRENT. They
> do not(!) indicate such problems reported.
> 
> Either I hit the crap installing a new system whilst there was a
> problem, or something really strange happened.
> 
> The bad thing is that kernel.old exits/dies with an exception and
> /boot/kernel/ seems to be completely corrupted. Tomorrow I try to
> install a prepared pkg tar arcive FreeBSD-kernel from a CURRENT pkg
> base and hope this will fix the issue.
> 
> Regards,
> 
> oh

The boot code checks the superblock hash and reports if it is wrong,
but ignores the error and continues to try and boot. The reason to
continue is to allow the system to come up so that the superblock
check hash can be fixed by running fsck. So your filesystem had
something more seriously wrong than just a bad superblock hash if
it could not be booted.

The fix in r350671 was to recompute the superblock check hash in a
place that I had missed earlier. I discovered the error when someone
reported getting superblock check hash errors when booting. But that
error did not cause their system to be unbootable for the reasons
that I explained in the previous paragraph.

If the filesystem started on 12-stable, then moving to 13 would not
have enabled superblock check hashes. They are only enabled when you
run fsck manually and explicitly say yes to the request to add superblock
check hashes. Running fsck -y will not add them, only when you run fsck
and explicitly respond yes to the superblock check hash addition request.
Filesystems created on 13 will get superblock check hashs. But if you
boot a 13 filesystem using a 12-stable kernel, they will be disabled and
left disabled even if you boot the filesystem on 13 again.

Thanks for pointing out the error on the fsck_ufs manual page. The first
backup superblock moved from 160 to 192 when the default block size was
raised from 16K to 32K. I have corrected the page in r350682.

Kirk McKusick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-10-13 Thread Kirk McKusick
> To: Enji Cooper 
> cc: "O. Hartmann" ,
> freebsd-current , mckus...@mckusick.com
> Subject: Re: CURRENT: supeblock hash failure - CURRENT wrecking disks
> From: "Poul-Henning Kamp" 
> 
> In message <39fb31e6-a8ec-484c-b297-39c19a787...@gmail.com>, Enji Cooper 
> writes
> :
> 
> There is an "interesting" failure-mechanism when you move a disk
> between 13/current and older systems which do not support ufs-hashes.
> 
> It will be prudent to make 11 and 12 clear the "use hashes" flags
> in the superblocks of all filesystems they mount R/W, to limit
> the amount havoc this will cause when people start playing with 13.
> 
> -- 
> Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
> p...@freebsd.org | TCP/IP since RFC 956
> FreeBSD committer   | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.

Both stable-11 and stable-12 clear the "use hashes" flags. If the disk
is moved back to a 13-head system they remain disabled until reenabled
by running fsck in interactive mode and requesting that they be enabled.

Kirk McKusick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-08-07 Thread O. Hartmann
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Am Wed, 07 Aug 2019 13:02:23 +
"Poul-Henning Kamp"  schrieb:

> 
> In message <39fb31e6-a8ec-484c-b297-39c19a787...@gmail.com>, Enji Cooper 
> writes
> :
> 
> There is an "interesting" failure-mechanism when you move a disk
> between 13/current and older systems which do not support ufs-hashes.
> 
> It will be prudent to make 11 and 12 clear the "use hashes" flags
> in the superblocks of all filesystems they mount R/W, to limit
> the amount havoc this will cause when people start playing with 13.
> 

The box in question has been setup 6 weeks ago with FreeBSD 13-CURRENT 
natively. It is now a
wreck. Other systems running CURRENT (as of the most recent revision as of 
today) were
partially installed as 12-STABLE/12-CURRENT and "moved on" to what is now 
13-CURRENT. They do
not(!) indicate such problems reported.

Either I hit the crap installing a new system whilst there was a problem, or 
something really
strange happened. 

The bad thing is that kernel.old exits/dies with an exception and /boot/kernel/ 
seems to be
completely corrupted. Tomorrow I try to install a prepared pkg tar arcive 
FreeBSD-kernel from
aCURRENT pkg base and hope this will fix the issue.

Regards,

oh

- -- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).
-BEGIN PGP SIGNATURE-

iHUEARYIAB0WIQSy8IBxAPDkqVBaTJ44N1ZZPba5RwUCXUrtzQAKCRA4N1ZZPba5
R4EpAPoC9Bk4l9aJXDsLF1mbZYxEqOxe8MknYx0ErVzaWDtHlQD9EogA6Xk71LEF
cxtLCo/HgEIdPaJtIOPiWh9dAg9t2wg=
=gLEF
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-08-07 Thread Poul-Henning Kamp

In message <39fb31e6-a8ec-484c-b297-39c19a787...@gmail.com>, Enji Cooper writes
:

There is an "interesting" failure-mechanism when you move a disk
between 13/current and older systems which do not support ufs-hashes.

It will be prudent to make 11 and 12 clear the "use hashes" flags
in the superblocks of all filesystems they mount R/W, to limit
the amount havoc this will cause when people start playing with 13.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-08-07 Thread Enji Cooper
(CCing Kirk)

> On Aug 7, 2019, at 01:37, O. Hartmann  wrote:
> 
> Hello,
> 
> today I ran into a ctastrophy with r350671. After installing a fresh compiled
> system and rebooted the box, UEFI loader dropped a bunch of errors, like some
> hex numbers stating, that a hash/superblock has is wrong and then the booting
> stopped at the OK loader prompt.
> 
> Rebooting the machine with the FreeBSD-13-CURRENT image from 1st August 2019
> and trying to fsck the filesystem(s) on the boot SSD (UFS2, journaling and 
> trim
> on), lots of unresolved block errors occured. But that didn't help much.
> Further, after several checks, I saw some commits to the ffs code recently adn
> tried to restore a copy of the superblock of each filesystem (in contrary to
> the man page for fsck_ufs, the first backup superblock resides in 192, not
> 160!). But things then get even worse, it seems the whole /boot structure is
> corrupted, the loader can not find the recent kernel and kernel.old is 
> crashing.
> 
> What's wrong here :-(
> 
> regards,
> oh
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


CURRENT: supeblock hash failure - CURRENT wrecking disks

2019-08-07 Thread O. Hartmann
Hello,

today I ran into a ctastrophy with r350671. After installing a fresh compiled
system and rebooted the box, UEFI loader dropped a bunch of errors, like some
hex numbers stating, that a hash/superblock has is wrong and then the booting
stopped at the OK loader prompt.

Rebooting the machine with the FreeBSD-13-CURRENT image from 1st August 2019
and trying to fsck the filesystem(s) on the boot SSD (UFS2, journaling and trim
on), lots of unresolved block errors occured. But that didn't help much.
Further, after several checks, I saw some commits to the ffs code recently adn
tried to restore a copy of the superblock of each filesystem (in contrary to
the man page for fsck_ufs, the first backup superblock resides in 192, not
160!). But things then get even worse, it seems the whole /boot structure is
corrupted, the loader can not find the recent kernel and kernel.old is crashing.

What's wrong here :-(

regards,
oh
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"