Re: Filesystem corruption on OpenBSD routers after power outage?
There is a discussion about sofdeps here http://openbsd-archive.7691.n7.nabble.com/What-are-the-disadvantages-of-soft-updates-td264283.html
Re: Filesystem corruption on OpenBSD routers after power outage?
Good day, I have few questions, why are soft updates not on by default, and does they help consistency in case of failure, are they recom‐ mended to be turned on only in specific case ? Except my mis‐ take, they help keeping drive consistent, avoid the need for fsck for most hard failure cases, and only risk to have unused spare sectores which can be later recovered. Would not they help with the file system in general or some draw back with their use ? Regards, J.F.
Sidenote: Filesystem corruption on OpenBSD routers after power outage?
> Even after many tries, I have not yet been able to corrupt the > filesystem so fsck cannot repair it without manual intervention. Another less severe corner fail case I have found is that on a couple of buggy 386 laptops (that will be replaced soon anyway) with temperamental over temp shutdowns on some bootups (and now failing host controller, I'm guessing due to age/damage). I have found LOST+FOUND can fill up the filesystem, preventing library and kernel randomisation from taking affect. I have a script to check and remove older LOST+FOUND files. It's unlikely that there would be anything important lost on these browsing machines anyway. I guess a proper solution would be thorny?
Re: Filesystem corruption on OpenBSD routers after power outage?
Ted Unangst wrote: > Theo de Raadt wrote: > > How does sync() fix this? Please explain this. Look at the source > > code. > > > > sync() is an asyncronous call requesting syncronization, and once > > it has marked the blocks that should be pushed, it returns before > > the work has been done. > > Ah, indeed. > > > > 2. cp could do an fsync call. There was a thread about this a while ago? > > > > How does fsync fix this? What if it returns an error. What do you do next. > > Should cp spin until fsync returns non-error, or what should it do. > > Exit with an error? Same as if it got EIO or ENOSPC from a write? Then the > command chain will stop before the mv. The OP demonstrated a problem. Any solution presented should try to fix the problem, not just "maybe".
Re: Filesystem corruption on OpenBSD routers after power outage?
Theo de Raadt wrote: > How does sync() fix this? Please explain this. Look at the source > code. > > sync() is an asyncronous call requesting syncronization, and once > it has marked the blocks that should be pushed, it returns before > the work has been done. Ah, indeed. > > 2. cp could do an fsync call. There was a thread about this a while ago? > > How does fsync fix this? What if it returns an error. What do you do next. > Should cp spin until fsync returns non-error, or what should it do. Exit with an error? Same as if it got EIO or ENOSPC from a write? Then the command chain will stop before the mv.
Re: Filesystem corruption on OpenBSD routers after power outage?
Ted Unangst wrote: > Mogens Jensen wrote: > > Even after many tries, I have not yet been able to corrupt the > > filesystem so fsck cannot repair it without manual intervention. > > However, if power is removed while the 'reorder_kernel' script runs, > > the system will become completely unbootable. I could do this multiple > > times. > > The new kernel is installed like this: > umask 077 && cp bsd /nbsd && mv /nbsd /bsd > > So a crash during compilation or linking shouldn't affect reboot. However, > there's a window between cp and mv where some of the new kernel may reside in > unwritten dirty buffers. Then mv will rewrite the directory entry. A crash at > this point will leave you with a broken kernel. I agree, dirty buffers are being pushed too slowly. > A few possible fixes. > > 1. Insert a sync call in there. Kinda heavyweight, but works. How does sync() fix this? Please explain this. Look at the source code. sync() is an asyncronous call requesting syncronization, and once it has marked the blocks that should be pushed, it returns before the work has been done. BUGS sync() may return before the buffers are completely flushed. This has been marked as a bug for over half your life, to the point where it isn't actually a bug. It is the design of the system call. So it doesn't solve what you want to solve. Who will be first person to says sync(); sync(); sync(); > 2. cp could do an fsync call. There was a thread about this a while ago? How does fsync fix this? What if it returns an error. What do you do next. Should cp spin until fsync returns non-error, or what should it do. > 3. mv could do the fsync instead, to make sure it doesn't move incomplete > files. Same question.
Re: Filesystem corruption on OpenBSD routers after power outage?
Mogens Jensen wrote: > Even after many tries, I have not yet been able to corrupt the > filesystem so fsck cannot repair it without manual intervention. > However, if power is removed while the 'reorder_kernel' script runs, > the system will become completely unbootable. I could do this multiple > times. The new kernel is installed like this: umask 077 && cp bsd /nbsd && mv /nbsd /bsd So a crash during compilation or linking shouldn't affect reboot. However, there's a window between cp and mv where some of the new kernel may reside in unwritten dirty buffers. Then mv will rewrite the directory entry. A crash at this point will leave you with a broken kernel. A few possible fixes. 1. Insert a sync call in there. Kinda heavyweight, but works. 2. cp could do an fsync call. There was a thread about this a while ago? 3. mv could do the fsync instead, to make sure it doesn't move incomplete files.
Re: Filesystem corruption on OpenBSD routers after power outage?
Since posting this question I have been trying to intentionally corrupt the router filesystem, by simulating power outages while writing files and various other things. Even after many tries, I have not yet been able to corrupt the filesystem so fsck cannot repair it without manual intervention. However, if power is removed while the 'reorder_kernel' script runs, the system will become completely unbootable. I could do this multiple times. A situation could be that the electric grid is unstable, so power will return and a new outage will occur shortly after, if this happens during the boot process at the exact time 'reorder_kernel' is running, the system will break because of a corrupt kernel and repair is not possible remotely. Is there a way to avoid 'reorder_kernel' during the boot process and run it manually instead? Thanks in advance. Mogens Jensen
Re: Filesystem corruption on OpenBSD routers after power outage?
On 19:30 Tue 04 Jun, Mogens Jensen wrote: > I'm going to build a router for use in a remote location, and I have > chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to > protect the router with an UPS, so it will have to be resilient enough > to survive sudden power outages and still boot without manual > intervention. > > In the past I have built a few Linux based routers and they were > configured to run from RAM. I have made some research to see if this is > also possible on OpenBSD and found that, while there are solutions to > have / read-only, none of this is officially supported. > > Can anyone with experience running OpenBSD routers without UPS, tell if > filesystem corruption is going to be a problem after power outages, or > if there are any officially supported ways to make the system resilient > enough to not break after a power outage? > > I'm using an mSATA disk with MLC flash in the router. > > Thanks in advance. I've had a couple of issues with my APU2-based router on 6.4. After the power outage the newly linked kernel was corrupted, and some files ended up in lost+found.
Re: Filesystem corruption on OpenBSD routers after power outage?
Yeah Marko, this blog did help me when I was resarching the issue ... Cheers, On Thu, 6 Jun 2019 at 10:07, Marko Cupać wrote: > On Tue, 04 Jun 2019 19:30:08 + > Mogens Jensen wrote: > > > Can anyone with experience running OpenBSD routers without UPS, tell > > if filesystem corruption is going to be a problem after power > > outages, or if there are any officially supported ways to make the > > system resilient enough to not break after a power outage? > > I have described my !!!UNSUPPORTED!!! setup !!!WARNING, BLATANT > SELF-PROMOTION!!! here: > > > https://www.mimar.rs/blog/how-to-increase-openbsds-resilience-to-power-outages > > So far I have two 6.5's on PCengine's apu2d4 (~20 6.2-6.4's). The only > "problem" I have since 6.4 is that I have to mount / rw when tcpdumping > because unveil does not like ro /etc. > > HTH, > -- > Before enlightenment - chop wood, draw water. > After enlightenment - chop wood, draw water. > > Marko Cupać > https://www.mimar.rs/ > > -- Kindest regards, Tom Smyth.
Re: Filesystem corruption on OpenBSD routers after power outage?
On Tue, 04 Jun 2019 19:30:08 + Mogens Jensen wrote: > Can anyone with experience running OpenBSD routers without UPS, tell > if filesystem corruption is going to be a problem after power > outages, or if there are any officially supported ways to make the > system resilient enough to not break after a power outage? I have described my !!!UNSUPPORTED!!! setup !!!WARNING, BLATANT SELF-PROMOTION!!! here: https://www.mimar.rs/blog/how-to-increase-openbsds-resilience-to-power-outages So far I have two 6.5's on PCengine's apu2d4 (~20 6.2-6.4's). The only "problem" I have since 6.4 is that I have to mount / rw when tcpdumping because unveil does not like ro /etc. HTH, -- Before enlightenment - chop wood, draw water. After enlightenment - chop wood, draw water. Marko Cupać https://www.mimar.rs/
Re: Filesystem corruption on OpenBSD routers after power outage?
On Jun 04 19:30:08, mogens-jen...@protonmail.com wrote: > Can anyone with experience running OpenBSD routers without UPS, tell if > filesystem corruption is going to be a problem after power outages I have been using various ALIXes with a CF card as storage, and in the 10+ years, I had to do a manual fsck on them about four times after a power outage. (Most often, the only indication of an outage is the SMS the router sends me upon reboot.) Jan
Re: Filesystem corruption on OpenBSD routers after power outage?
On Tue, Jun 4, 2019 at 3:34 PM Mogens Jensen wrote: > Can anyone with experience running OpenBSD routers without UPS, tell if > filesystem corruption is going to be a problem after power outages, or > if there are any officially supported ways to make the system resilient > enough to not break after a power outage? > > I'm using an mSATA disk with MLC flash in the router. > I have some OpenBSD routers without UPS protection (Soekris net6501 devices) and after using them for some years, I think it's not possible to have absolute 100% protection from filesystem corruption due to power problems, without causing other problems such as making the system overly fragile to upgrade or maintain. However, it works reasonably well to put /var/log on an MFS file system, and set up a cron job (as well as a line in /etc/rc.shutdown) to periodically rsync /var/log to another directory (so that logs will be preserved after a reboot). This works fairly well, and the system comes up properly after power failures easily over 98% of the time. Very rarely (i.e. I have seen it happen twice in a decade) you will get unlucky and have corruption anyway that requires you to run "fsck -y" manually. This is rare enough that I haven't bothered trying to automate it away. To accomplish this, I installed OpenBSD with /var/log being a separate filesystem, then edited /etc/fstab to rename /var/log to /mfs/log, and add a new entry for /var/log: swap /var/log mfs rw,nodev,nosuid,-s=128M,-P=/mfs/log 0 0 Initializing the MFS /var/log by loading from /mfs/log, combined with an rsync command in /etc/rc.shutdown, is what gives the illusion of /var/log being preserved across reboots. -ken
Re: Filesystem corruption on OpenBSD routers after power outage?
Is there any way to tell the boot script to use the "-y" flag in fsck? If something goes wrong with simple fsck, I always simply do a "fsck -y". There is no other option for me. So, it would be VERY useful if this could be done automatically instead of interrupting the router startup. Thanks. On 6/5/19 1:30 AM, Nick Holland wrote: On 6/4/19 1:29 PM, Mogens Jensen wrote: I'm going to build a router for use in a remote location, and I have chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to protect the router with an UPS, so it will have to be resilient enough to survive sudden power outages and still boot without manual intervention. In the past I have built a few Linux based routers and they were configured to run from RAM. I have made some research to see if this is also possible on OpenBSD and found that, while there are solutions to have / read-only, none of this is officially supported. Can anyone with experience running OpenBSD routers without UPS, tell if filesystem corruption is going to be a problem after power outages, or if there are any officially supported ways to make the system resilient enough to not break after a power outage? I'm using an mSATA disk with MLC flash in the router. I realized a few decades ago that consumer UPSs are a bad investment. Industrial UPSs are a dubious idea in business unless you have a dual-power supply machine and can hook each PS to a DIFFERENT UPS -- in my area, grid power is more reliable than cheap UPSes (your mileage may vary). And you have to MAINTAIN your UPSs, otherwise after a few years, UPSs turn minor glitches into power outages (thank you very much). I'm also fond of proving my own claims, so I very often just yank the cord on my systems rather than doing orderly shutdowns. Yes, if you drop power on an OpenBSD system, you will get an fsck on reboot. Solution: Make your partitions as small as reasonable. Just because you got a 500G disk for cheap, no reason to allocate all 500G. For a router, 10G is PLENTY, and will fsck quickly. If you have slow media (i.e., flash drives), you might want to aim for 1G. Every once in a long while, you might catch a really bad time for the power to go out, and have to manually say "Fix it!" to fsck, but for the most part, the system will just come back up after the power comes back on. The less you write to disk, the less risk you have of having to manually intervene in your system's reboot. IF you want to do some fancy logging, keep the logging partition out of the fstab file, and have a script that brings it up with a "fsck -y" AFTER the system comes up, and start the fancy logging AFTER the big logging partition successfully mounts. But don't do stupid games to try to improve your chances, just make sure there's a monitor and keyboard available to fix any problems that might happen. Simple systems have simple problems. Complex systems break in complex ways. You want me to swear you'll never have to manually intervene in boot after an "event"? Nope. But I've walked non-technical people through single-user fsck's over the phone; when your bastardized system breaks, you will be down for a lot longer and you will be going on-site to fix. Nick.
Re: Filesystem corruption on OpenBSD routers after power outage?
On Wed, Jun 05, 2019 at 05:12:20AM +, Roderick wrote: > > "-o union" was last in 3.7, disappeared in 3.8. Was there a reason? > > https://man.openbsd.org/OpenBSD-3.7/mount Yes, the developers felt we couldn't make it work without bugs in a sane way. Locks over locks is insanely hard to get right.
Re: Filesystem corruption on OpenBSD routers after power outage?
On Wed, Jun 05, 2019 at 05:12:20AM +, Roderick wrote: > > "-o union" was last in 3.7, disappeared in 3.8. Was there a reason? It was broken and complicated the filesystem code beyond measure. -Otto > > https://man.openbsd.org/OpenBSD-3.7/mount > > Rodrigo > > > What also would be practical is a "mount -o union" like in FreeBSD, > > but unfortunately I do not see it in OpenBSD. > > > > Then one could mount a mfs system over a normal one, only to be read. > > > > Rodrigo >
Re: Filesystem corruption on OpenBSD routers after power outage?
"-o union" was last in 3.7, disappeared in 3.8. Was there a reason? https://man.openbsd.org/OpenBSD-3.7/mount Rodrigo What also would be practical is a "mount -o union" like in FreeBSD, but unfortunately I do not see it in OpenBSD. Then one could mount a mfs system over a normal one, only to be read. Rodrigo
Re: Filesystem corruption on OpenBSD routers after power outage?
What also would be practical is a "mount -o union" like in FreeBSD, but unfortunately I do not see it in OpenBSD. Then one could mount a mfs system over a normal one, only to be read. Rodrigo
Re: Filesystem corruption on OpenBSD routers after power outage?
Look at -P option in mount_mfs. Rodrigo
Re: Filesystem corruption on OpenBSD routers after power outage?
On 6/4/19 3:30 PM, Mogens Jensen wrote: I'm going to build a router for use in a remote location, and I have chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to protect the router with an UPS, so it will have to be resilient enough to survive sudden power outages and still boot without manual intervention. In the past I have built a few Linux based routers and they were configured to run from RAM. I have made some research to see if this is also possible on OpenBSD and found that, while there are solutions to have / read-only, none of this is officially supported. Can anyone with experience running OpenBSD routers without UPS, tell if filesystem corruption is going to be a problem after power outages, or if there are any officially supported ways to make the system resilient enough to not break after a power outage? I'm using an mSATA disk with MLC flash in the router. Thanks in advance. Mogens Jensen As Mr. Holland points out, a UPS doesn't really help overall reliability. In practice, /, /bin, and /usr are effectively read-only except for kernel and shared library randomization at boot time. /var gets written infrequently for logs, etc. /tmp, of course, is frequently written but its contents are irrelevant after a reboot. An important way to reduce disk activity is to mount all filesystems "noatime". This suppresses effectively all writes to /, /bin, and /usr after boot. Changes to /var get pushed to disk fairly quickly. The likelihood of significant corruption is very small. In practice, I knock my router off-line once or twice a month by messing with power cables nearby. The only way I find out is by looking at the logs. I've never had to manually fsck any of my routers except after electrical storms - and only then after moving the disk to a non-smoking chassis. Physical access to a console by a trusted person or remote console access is required. Not for any failings of OpenBSD in particular but for the guaranteed perversity of electronic devices and unforseeable acts of nature and man messing up the local environment. You will [should] access the system twice a year to install the latest release. [ insert standard disclaimers here ] Geoff Steckel
Re: Filesystem corruption on OpenBSD routers after power outage?
On 6/4/19 1:29 PM, Mogens Jensen wrote: > I'm going to build a router for use in a remote location, and I have > chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to > protect the router with an UPS, so it will have to be resilient enough > to survive sudden power outages and still boot without manual > intervention. > > In the past I have built a few Linux based routers and they were > configured to run from RAM. I have made some research to see if this is > also possible on OpenBSD and found that, while there are solutions to > have / read-only, none of this is officially supported. > > Can anyone with experience running OpenBSD routers without UPS, tell if > filesystem corruption is going to be a problem after power outages, or > if there are any officially supported ways to make the system resilient > enough to not break after a power outage? > > I'm using an mSATA disk with MLC flash in the router. I realized a few decades ago that consumer UPSs are a bad investment. Industrial UPSs are a dubious idea in business unless you have a dual-power supply machine and can hook each PS to a DIFFERENT UPS -- in my area, grid power is more reliable than cheap UPSes (your mileage may vary). And you have to MAINTAIN your UPSs, otherwise after a few years, UPSs turn minor glitches into power outages (thank you very much). I'm also fond of proving my own claims, so I very often just yank the cord on my systems rather than doing orderly shutdowns. Yes, if you drop power on an OpenBSD system, you will get an fsck on reboot. Solution: Make your partitions as small as reasonable. Just because you got a 500G disk for cheap, no reason to allocate all 500G. For a router, 10G is PLENTY, and will fsck quickly. If you have slow media (i.e., flash drives), you might want to aim for 1G. Every once in a long while, you might catch a really bad time for the power to go out, and have to manually say "Fix it!" to fsck, but for the most part, the system will just come back up after the power comes back on. The less you write to disk, the less risk you have of having to manually intervene in your system's reboot. IF you want to do some fancy logging, keep the logging partition out of the fstab file, and have a script that brings it up with a "fsck -y" AFTER the system comes up, and start the fancy logging AFTER the big logging partition successfully mounts. But don't do stupid games to try to improve your chances, just make sure there's a monitor and keyboard available to fix any problems that might happen. Simple systems have simple problems. Complex systems break in complex ways. You want me to swear you'll never have to manually intervene in boot after an "event"? Nope. But I've walked non-technical people through single-user fsck's over the phone; when your bastardized system breaks, you will be down for a lot longer and you will be going on-site to fix. Nick.
Re: Filesystem corruption on OpenBSD routers after power outage?
there is also an option for setting fsck to approve fixes without a prompt .. but I cant think of it off the top of my head... and this would be useful to set on your routers also On Tue, 4 Jun 2019 at 21:05, Tom Smyth wrote: > Hi Mogens, > > there are a number of threads on this if you search the misc archives on > marc.info, > > but setting softdep,noatime mount options on /etc/fstab is advisable > > for routers I tend to use mfs for partitions that tend to get written to > alot > > the following entries (/etc/fstab) show How I use mfs on my routers... > swap /tmp mfs > rw,nosuid,noexec,nodev,-s=512000,-P=/directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/tmp > 0 0 > swap /var mfs rw,nosuid,noexec,nodev,-s=1024000,-P=/ > directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/var 0 0 > swap /dev mfs rw,nosuid,noexec,-P=/ > directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/dev,-i=2048,-s=102400 > 0 0 > > but bear in mind that that uses up to 1.6GB of ram ... so you might > want to tweak. to what suits your needs... > > check out conway's resflash and cappucios flashrd also > > https://www.packetmischief.ca/openbsd-compact-flash-firewall/ > > I hope this helps > Tom Smyth > > On Tue, 4 Jun 2019 at 20:31, Mogens Jensen > wrote: > >> I'm going to build a router for use in a remote location, and I have >> chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to >> protect the router with an UPS, so it will have to be resilient enough >> to survive sudden power outages and still boot without manual >> intervention. >> >> In the past I have built a few Linux based routers and they were >> configured to run from RAM. I have made some research to see if this is >> also possible on OpenBSD and found that, while there are solutions to >> have / read-only, none of this is officially supported. >> >> Can anyone with experience running OpenBSD routers without UPS, tell if >> filesystem corruption is going to be a problem after power outages, or >> if there are any officially supported ways to make the system resilient >> enough to not break after a power outage? >> >> I'm using an mSATA disk with MLC flash in the router. >> >> Thanks in advance. >> >> Mogens Jensen >> > > > -- > Kindest regards, > Tom Smyth. > -- Kindest regards, Tom Smyth.
Re: Filesystem corruption on OpenBSD routers after power outage?
Hi Mogens, there are a number of threads on this if you search the misc archives on marc.info, but setting softdep,noatime mount options on /etc/fstab is advisable for routers I tend to use mfs for partitions that tend to get written to alot the following entries (/etc/fstab) show How I use mfs on my routers... swap /tmp mfs rw,nosuid,noexec,nodev,-s=512000,-P=/directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/tmp 0 0 swap /var mfs rw,nosuid,noexec,nodev,-s=1024000,-P=/ directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/var 0 0 swap /dev mfs rw,nosuid,noexec,-P=/ directorythatcontainsfilesthatwillbecopiedtomemoryatbootup/dev,-i=2048,-s=102400 0 0 but bear in mind that that uses up to 1.6GB of ram ... so you might want to tweak. to what suits your needs... check out conway's resflash and cappucios flashrd also https://www.packetmischief.ca/openbsd-compact-flash-firewall/ I hope this helps Tom Smyth On Tue, 4 Jun 2019 at 20:31, Mogens Jensen wrote: > I'm going to build a router for use in a remote location, and I have > chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to > protect the router with an UPS, so it will have to be resilient enough > to survive sudden power outages and still boot without manual > intervention. > > In the past I have built a few Linux based routers and they were > configured to run from RAM. I have made some research to see if this is > also possible on OpenBSD and found that, while there are solutions to > have / read-only, none of this is officially supported. > > Can anyone with experience running OpenBSD routers without UPS, tell if > filesystem corruption is going to be a problem after power outages, or > if there are any officially supported ways to make the system resilient > enough to not break after a power outage? > > I'm using an mSATA disk with MLC flash in the router. > > Thanks in advance. > > Mogens Jensen > -- Kindest regards, Tom Smyth.
Filesystem corruption on OpenBSD routers after power outage?
I'm going to build a router for use in a remote location, and I have chosen OpenBSD 6.5 for the task. Unfortunately, it's not possible to protect the router with an UPS, so it will have to be resilient enough to survive sudden power outages and still boot without manual intervention. In the past I have built a few Linux based routers and they were configured to run from RAM. I have made some research to see if this is also possible on OpenBSD and found that, while there are solutions to have / read-only, none of this is officially supported. Can anyone with experience running OpenBSD routers without UPS, tell if filesystem corruption is going to be a problem after power outages, or if there are any officially supported ways to make the system resilient enough to not break after a power outage? I'm using an mSATA disk with MLC flash in the router. Thanks in advance. Mogens Jensen