I like the idea of just removing the journal tag, but I don't think that I can modify the /boot filesystem.  Doing a umount works, but tune2fs claims that an e2fsck is required.  Running e2fsck says that the filesystem is still mounted, even though it is not.  Doing a fuser /dev/sda1 shows a large number of /proc/fd entries using it, even though it successfully umounted.

So, something is referencing the filesystem in a very bad way, and not as a mounted fs.

So, I'm still stuck.  How do you successfully modify /boot to not be journalled?


On 2020-05-08 3:25 p.m., Roger Heflin wrote:
You have to hit the timing right.  ie install the kernel package and
as quickly as possible reboot (automated, or very efficient).

And if the update is more than just kernel, that may slowdown the
process enough that the immediate reboot won't be quick enough.

I have seen it 3-5 times and that is over a huge number of machines,
in those the machines were booted and failed multiple times before
someone livecd booted it ran fsck'ed and/or mounted /boot, and it
found the files after that.

On Fri, May 8, 2020 at 2:00 PM Mauricio Tavares <raubvo...@gmail.com> wrote:
On Fri, May 8, 2020 at 12:12 PM Roger Heflin <rogerhef...@gmail.com> wrote:
A sync will flush the writes to the journal were the data is safe.  It
will not force a replay of the journal.

Nothing except removing the journal from the ext4 filesystem will fix it.

This is not a fedora bug, this is a long standing
kernel/grub/filesystem interaction bug (all who use a journaled
filesystem have this bug).

See tune2fs and something like -O ^has_journal will turn off the
journal.  It has to be done unmounted and verify that your fstab entry
will remounted it.

Check /proc/mounts having data=XXX (probably ordered) says you have a
journal, after the umount+above tune2fs+remount the data=ordered will
be gone.

       Interesting. I have a box which have been running for years with

/dev/sdb1 /boot ext4
rw,seclabel,noatime,barrier=1,stripe=32,data=ordered,discard 0 0

and so far never borked on me.

On Fri, May 8, 2020 at 10:53 AM John Mellor <john.mel...@gmail.com> wrote:
Interesting!  This machine does reboot in about 5secs and the other
machines take longer, so it makes sense.  My /boot is mounted just like
/home and / as follows:

     /dev/sda1 on /boot type ext4 (rw,relatime,seclabel)

I assume that a symple sync would flush the journal.  Its pretty easy to
do a sync;sync if updating using the CLI, but not possible when using
the GUI.  Is this a Fedora bug where the journal is not correctly
flushed on the reboot?  Should I modify that mount entry or do achattr
change to workaround the bug?


On 2020-05-08 11:11 a.m., Roger Heflin wrote:
What you are saying does not exactly match what I have previously
seen, but there is a known feature with using a journaling filesystem
(ext4-journal, or xfs) for /boot, if only the journal is updated and
if it is not yet replayed  into the non-journal then grub will not be
able to find the new files/updated files (grub filesystem code is
simple and does not process the journal so if critical updates are
still in the journal then those updates(changed file, new files)
cannot be seen).  To get this one generally has to do the update and
almost immediately reboot (within a few minutes though in some cases,
note syncing the does not replay the journal).   The fix is to boot up
with a kernel that it can still find and/or livecd and mount /boot so
that the journal gets replayed, or fsck boot so that the journal gets
replayed.

Long term the solution is to move boot to a non-journaled fs (ext
without a journal) or after each update umount/mount /boot(before
reboot)..  If /boot is not separated then you cannot umount/mount it
to get the journal to replay.  There is a second method to force a
journal replay, but reports say that one often "hangs" when /boot is
not separate so is not a reliable solution.    There were some
detailed posts on this several years ago with reliable commenters
confirming the behavior.  I have also personally seen the issue a
number of times and mount /boot and/or fscking corrects it (replays
journal).

On Fri, May 8, 2020 at 8:52 AM John Mellor <john.mel...@gmail.com> wrote:
I have one completely stock workstation F32 machine where kernel updates
almost always cause a multiple-reboot panic problem.  This problem also
occurred on F31, but not on releases before that. I'm stumped and need
some help in figuring it out.

The symptoms vary in the number of reboots and the type of tertiary
error, but are otherwise pretty similar.  It does not matter whether I
use the Gnome update app or the CLI dnf method. After a number of
reboots, the upgrade succeeds and Fedora behaves nortmally again.  I
think that this only happens whenever the kernel is upgraded.

What I observe is that the machine is rebooted and on reboot, grub (I
think) gets a halt for a 32-bit relocation error.  This sequence may
happen twice.  Its an i7 with plenty of memory and an SSD boot disk, so
the 32-bit thing is confusing.  To get around this error, I powercycle
the box and get into the next stage of the problem.  One the 2nd or 3rd
reboot, I usually see a halt with an access outside of the kernel space,
although with the update this morning, I had a kernel panic instead.
Cold-booting again, and the update is installed, and the last reboot and
I'm up on the new updates.

After that, the machine behaves normally until the next kernel updates.
I assume that there is some incorrectly-asynchronous operation in grub
related to the update entry, but I can find no grub logs to dig into
this problem.  I have several other machines that do not see this
problem.  I dug around in the fedora bugs, but not knowing what to look
for, I'm basically blind.  Its a pretty serious bug, especially if the
machine is remote.  Does anyone have a way out of this?

--

John Mellor
_______________________________________________

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Reply via email to