Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure
On Mi, 26.02.20 10:39, Michael Biebl (mbi...@gmail.com) wrote: > Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl > : > > > > >>> Vito Caputo schrieb am 25.02.2020 um 01:01 in > > Nachricht > > <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu > > > > eneration.com>: > > > Hello list, > > > > > > Today I experienced an unclean shutdown due to battery dying unexpectedly, > > > and it left my /var in a state requiring a manual fsck to repair errors. > > > > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext > > >=3 > > this should be quite fast. ReiserFS was rather slow several years ago (it > > did > > replay too much IMHO), but haven't used it the last five years. > > > > > > > > The normal startup process failed and dropped me to a rescue shell after > > > asking for my root password. But I was unable to immediately run fsck > > > manually, because systemd was endlessly trying to fsck /var. > > > > That's not a problem of fsck. > > > I suspect that the real problem is, that fsck failed to fix the file > system, so as a result, systemd tried repeatedly to start the fsck job > for /var as var.mount was pulled in as a dependency (e.g. for > journald). The question is: why *repeatedly* though? i.e. why does it keep doing that if nothing else happens? journald should not trigger that all the time... Also, there's actually a safety condition in place, the start limit logic: after a service has been attempted to be started too often within a time window we refuse starting it again... So I am a bit puzzled about this. Some logs would be great to have about this... Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure
On Wed, Feb 26, 2020 at 10:39:50AM +0100, Michael Biebl wrote: > Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl > : > > > > >>> Vito Caputo schrieb am 25.02.2020 um 01:01 in > > Nachricht > > <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu > > > > eneration.com>: > > > Hello list, > > > > > > Today I experienced an unclean shutdown due to battery dying unexpectedly, > > > and it left my /var in a state requiring a manual fsck to repair errors. > > > > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext > > >=3 > > this should be quite fast. ReiserFS was rather slow several years ago (it > > did > > replay too much IMHO), but haven't used it the last five years. > > > > > > > > The normal startup process failed and dropped me to a rescue shell after > > > asking for my root password. But I was unable to immediately run fsck > > > manually, because systemd was endlessly trying to fsck /var. > > > > That's not a problem of fsck. > > > I suspect that the real problem is, that fsck failed to fix the file > system, so as a result, systemd tried repeatedly to start the fsck job > for /var as var.mount was pulled in as a dependency (e.g. for > journald). That's what seemed to be occurring, ad infinitum. In this particular instance, at least it wasn't due to hardware errors and the constant barrage of disk accesses did little more than flash the disk status light on my thinkpad and prevent manual fscking, while I tried to figure out how to correctly calm things down for a manual fsck. But it doesn't seem particularly helpful for the failed fsck to keep getting restarted. If there were actual hardware errors, this behavior could be exacerbating them during the initial investigation stage. If it were triggering bus resets and timeouts, as I've experienced in the past with spinning rust on the sata bus, the system could have been very difficult and time consuming to interact with. IMHO the failed fsck should not be retried automatically at all. Fail the fsck more permanently, log something in the journal about it with some hints as to what might be the appropriate next step, and leave the system quiescent while it waits for the root password for recovery... Regards, Vito Caputo ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure
Am Mi., 26. Feb. 2020 um 10:13 Uhr schrieb Ulrich Windl : > > >>> Vito Caputo schrieb am 25.02.2020 um 01:01 in > Nachricht > <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu > > eneration.com>: > > Hello list, > > > > Today I experienced an unclean shutdown due to battery dying unexpectedly, > > and it left my /var in a state requiring a manual fsck to repair errors. > > I wonder: Shouldn't be a fsck just be a journal reply these days? For ext >=3 > this should be quite fast. ReiserFS was rather slow several years ago (it did > replay too much IMHO), but haven't used it the last five years. > > > > > The normal startup process failed and dropped me to a rescue shell after > > asking for my root password. But I was unable to immediately run fsck > > manually, because systemd was endlessly trying to fsck /var. > > That's not a problem of fsck. I suspect that the real problem is, that fsck failed to fix the file system, so as a result, systemd tried repeatedly to start the fsck job for /var as var.mount was pulled in as a dependency (e.g. for journald). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Antw: [EXT] Infinite loop at startup on var fsck failure
>>> Vito Caputo schrieb am 25.02.2020 um 01:01 in Nachricht <7343_1582589314_5e546582_7343_4690_1_20200225000143.nowls5peec5sx...@shells.gnu eneration.com>: > Hello list, > > Today I experienced an unclean shutdown due to battery dying unexpectedly, > and it left my /var in a state requiring a manual fsck to repair errors. I wonder: Shouldn't be a fsck just be a journal reply these days? For ext >=3 this should be quite fast. ReiserFS was rather slow several years ago (it did replay too much IMHO), but haven't used it the last five years. > > The normal startup process failed and dropped me to a rescue shell after > asking for my root password. But I was unable to immediately run fsck > manually, because systemd was endlessly trying to fsck /var. That's not a problem of fsck. > > Stopping, disabling, masking, none of those obvious options to prevent > 'systemd‑fsck@dev‑mapper‑ssd\x2var.service' from starting again in > this loop worked, and I don't recall seeing any guidance in the journal on > what was the appropriate course of action. > > Eventually I resorted to `systemctl emergency` which seemed to get things > quieted down enough for me to run the fsck manually. > > All's well that ends well, but what an *awful* user experience. Is this > really how things are supposed to play out when a fsck on something like > /var fails? I was very much left in the dark at a root shell with systemd > pointlessly spinning its wheels hopelessly running the same fsck > repeatedly. > > It's possible this is already better in a newer systemd release, but I just > wanted to document this experience in case it's an area that still needs > improvement. > > This is on an old release (v232) in Debian 9.11 amd64. > > Regards, > Vito Caputo > ___ > systemd‑devel mailing list > systemd‑de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/systemd‑devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel