Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure

2020-03-31 Thread Lennart Poettering
On Mi, 26.02.20 10:39, Michael Biebl (mbi...@gmail.com) wrote:

> Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl
> :
> >
> > >>> Vito Caputo  schrieb am 25.02.2020 um 01:01 in
> > Nachricht
> > <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu
> >
> > eneration.com>:
> > > Hello list,
> > >
> > > Today I experienced an unclean shutdown due to battery dying unexpectedly,
> > > and it left my /var in a state requiring a manual fsck to repair errors.
> >
> > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext 
> > >=3
> > this should be quite fast. ReiserFS was rather slow several years ago (it 
> > did
> > replay too much IMHO), but haven't used it the last five years.
> >
> > >
> > > The normal startup process failed and dropped me to a rescue shell after
> > > asking for my root password.  But I was unable to immediately run fsck
> > > manually, because systemd was endlessly trying to fsck /var.
> >
> > That's not a problem of fsck.
>
>
> I suspect that the real problem is, that fsck failed to fix the file
> system, so as a result, systemd tried repeatedly to start the fsck job
> for /var as var.mount was pulled in as a dependency (e.g. for
> journald).

The question is: why *repeatedly* though? i.e. why does it keep doing
that if nothing else happens? journald should not trigger that all the
time...

Also, there's actually a safety condition in place, the start limit
logic: after a service has been attempted to be started too often
within a time window we refuse starting it again...

So I am a bit puzzled about this. Some logs would be great to have
about this...

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure

2020-02-26 Thread Vito Caputo
On Wed, Feb 26, 2020 at 10:39:50AM +0100, Michael Biebl wrote:
> Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl
> :
> >
> > >>> Vito Caputo  schrieb am 25.02.2020 um 01:01 in
> > Nachricht
> > <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu
> >
> > eneration.com>:
> > > Hello list,
> > >
> > > Today I experienced an unclean shutdown due to battery dying unexpectedly,
> > > and it left my /var in a state requiring a manual fsck to repair errors.
> >
> > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext 
> > >=3
> > this should be quite fast. ReiserFS was rather slow several years ago (it 
> > did
> > replay too much IMHO), but haven't used it the last five years.
> >
> > >
> > > The normal startup process failed and dropped me to a rescue shell after
> > > asking for my root password.  But I was unable to immediately run fsck
> > > manually, because systemd was endlessly trying to fsck /var.
> >
> > That's not a problem of fsck.
> 
> 
> I suspect that the real problem is, that fsck failed to fix the file
> system, so as a result, systemd tried repeatedly to start the fsck job
> for /var as var.mount was pulled in as a dependency (e.g. for
> journald).

That's what seemed to be occurring, ad infinitum.

In this particular instance, at least it wasn't due to hardware
errors and the constant barrage of disk accesses did little more
than flash the disk status light on my thinkpad and prevent
manual fscking, while I tried to figure out how to correctly calm
things down for a manual fsck.

But it doesn't seem particularly helpful for the failed fsck to
keep getting restarted.  If there were actual hardware errors,
this behavior could be exacerbating them during the
initial investigation stage.  If it were triggering bus resets
and timeouts, as I've experienced in the past with spinning rust
on the sata bus, the system could have been very difficult and
time consuming to interact with.

IMHO the failed fsck should not be retried automatically at all.
Fail the fsck more permanently, log something in the journal
about it with some hints as to what might be the appropriate next
step, and leave the system quiescent while it waits for the root
password for recovery...

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure

2020-02-26 Thread Michael Biebl
Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl
:
>
> >>> Vito Caputo  schrieb am 25.02.2020 um 01:01 in
> Nachricht
> <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu
>
> eneration.com>:
> > Hello list,
> >
> > Today I experienced an unclean shutdown due to battery dying unexpectedly,
> > and it left my /var in a state requiring a manual fsck to repair errors.
>
> I wonder: Shouldn't be a fsck just be a journal reply these days? For ext >=3
> this should be quite fast. ReiserFS was rather slow several years ago (it did
> replay too much IMHO), but haven't used it the last five years.
>
> >
> > The normal startup process failed and dropped me to a rescue shell after
> > asking for my root password.  But I was unable to immediately run fsck
> > manually, because systemd was endlessly trying to fsck /var.
>
> That's not a problem of fsck.


I suspect that the real problem is, that fsck failed to fix the file
system, so as a result, systemd tried repeatedly to start the fsck job
for /var as var.mount was pulled in as a dependency (e.g. for
journald).
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure

2020-02-26 Thread Ulrich Windl
>>> Vito Caputo  schrieb am 25.02.2020 um 01:01 in
Nachricht
<7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu

eneration.com>:
> Hello list,
> 
> Today I experienced an unclean shutdown due to battery dying unexpectedly,
> and it left my /var in a state requiring a manual fsck to repair errors.

I wonder: Shouldn't be a fsck just be a journal reply these days? For ext >=3
this should be quite fast. ReiserFS was rather slow several years ago (it did
replay too much IMHO), but haven't used it the last five years.

> 
> The normal startup process failed and dropped me to a rescue shell after
> asking for my root password.  But I was unable to immediately run fsck
> manually, because systemd was endlessly trying to fsck /var.

That's not a problem of fsck.

> 
> Stopping, disabling, masking, none of those obvious options to prevent
> 'systemd‑fsck@dev‑mapper‑ssd\x2var.service' from starting again in
> this loop worked, and I don't recall seeing any guidance in the journal on
> what was the appropriate course of action.
> 
> Eventually I resorted to `systemctl emergency` which seemed to get things
> quieted down enough for me to run the fsck manually.
> 
> All's well that ends well, but what an *awful* user experience.  Is this
> really how things are supposed to play out when a fsck on something like
> /var fails?  I was very much left in the dark at a root shell with systemd
> pointlessly spinning its wheels hopelessly running the same fsck
> repeatedly.
> 
> It's possible this is already better in a newer systemd release, but I just
> wanted to document this experience in case it's an area that still needs
> improvement.
> 
> This is on an old release (v232) in Debian 9.11 amd64.
> 
> Regards,
> Vito Caputo
> ___
> systemd‑devel mailing list
> systemd‑de...@lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/systemd‑devel 



___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel