I have yet to investigate intrigeri's suggestions from 2017, however I would 
suggest that this is something that needs to be upgraded from wishlist in 2022, 
and here's the reason simply enough:

root@aki:~# nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
[..]
unsafe_shutdowns                        : 106
[..]
num_err_log_entries                     : 284
[..]
root@aki:~# nvme smart-log /dev/nvme1
Smart Log for NVME device:nvme1 namespace-id:ffffffff
[..]
unsafe_shutdowns                        : 121
[..]
num_err_log_entries                     : 291
[..]

Given that the frequency and number of SMART errors are deemed an indicator of 
drive health, that's bad. Also, improper shutdown on NVMe devices could be 
particularly problematic because they have caches and wear leveling and cleanup 
cycles that could happen any time the drive is "running" until a shutdown 
command is issued and responded to. There might actually be some risk of data 
corruption/loss. (I doubt it with commodity consumer SSDs, but Debian isn't 
just run on those.)

For a few weeks, we tried on #debian to sort out the cause of the above errors. 
We thought NVMe drive quirk Linux doesn't support? Maybe Linux is issuing the 
shutdown command and not waiting long enough? There's Google bait suggesting 
that's a problem, and there's some BS factoids in dpkg I should remove the next 
time I connect to OFTC describing the "solution" which I've since discovered 
doesn't work. This was hard to test because obviously no logger is running at 
this point of the shutdown process.

The root cause of the problem isn't an unknown quirk, it's that I have LVM on 
LUKS. (See what I did there?) Connected a drive with an unencrypted Debian 
system on it that mounted my main installation's /boot and even the LUKS/LVM 
root somewhere and never got a single unsafe shutdown despite multiple 
reboots/shutdowns. Because that temp install's root was not on LVM on LUKS 
backing.

Dracut is a suboptimal solution. In part because after three days of trying to 
get it to boot my system, I've yet to see it do so, and because while there's 
lots of documentation for it, it's for other distributions, it's wrong, it's 
obsolete, or it's misleading. Including one rantthrough from 2017 that offers a 
profanity-laden survey of most of the others and why they don't work for Debian 
systems or at all.

As far as I can tell you either need to significantly modify grub or switch to 
systemd-boot or set up Dracut to generate an EFI executable blob using files 
that aren't available on a Debian system or throw up my hands and go use Fedora 
until I understand Dracut enough to try and use it on Debian. Or something. 
Again: What sparse documentation exists is spotty, inconsistent, and at least 
five years out of date. Dracut is not how Debian does things, just like OpenRC 
and rEFInd are not how Debian does things. That's all there if you want to set 
it up, but you're not going to find many Debian resources on using it.

I think unsafe shutdowns of NVMe devices is actually a bug. And I think it 
could cause data loss or corruption on more advnaced hardware than I'm using. 
There's a few options for addressing it and most of them become problems beyond 
initramfs-tools' scope. But this seven year old bug might be the path of least 
resistance.

Joseph

Reply via email to