Jan Setje-Eilers writes: > You seem to be arguing that we can come up with old data without > rolling/committing logs. If that's acceptable, then we can some up > with the old archive as well and can just eliminate the check.
I don't think so. The effects of the boot archive are quite surprising to administrators because they have a much longer delay time. If I edit /etc/system (or, worse, perform some action that has the side-effect of editing that file), then my expectation is that unless the system goes down unexpectedly *right away*, those bits will be on the disk "soon." I have an expectation that after a few seconds (and perhaps a few mumbled superstitious 'sync' invocations), everything that I've changed administratively is stable again. User data in flight may be discarded if the system crashes while it's in flight, but the OS itself is stable. That's true on the current SPARC systems, but not true once we have a boot archive containing volatile files. In that case, I've got an extra step to perform: regenerating the boot archive after doing some edits. On x86, because I can't easily know when these sorts of changes happen, I've taken to uttering "bootadm update-archive -v" every now and then, just on spec. Every once in a while, it catches something surprising. (Particularly so in the first reboot after an upgrade -- something about the boot process almost always tweaks /etc/system or some famous file, meaning that right after upgrade, my system is just _always_ in an unstable state.) I suspect that some customers are using cron jobs for similar effect. It's like the bad old days with "sync ; sync ; sync" ... wait for it ... "reboot." It causes users to distrust the system. > I get the impression that you're placing some value on how old the > data is. However in the case of a non-interfaced binary kernel > component that really doesn't matter. It only matters if it's > compatible with the rest of the bits or not. So either way we'd need > the check. Yes, part of it is a concern over how old the data in the archive are. The other part is the effect of failure: when this happens, the machine is stuck in boot. Unless you've got access to the console, and realize what's happened, the machine is just a warm brick. Moving the volatile files out of the archive limits the scope of the problem. At that point, it's _only_ intentional packaging changes that could affect the consistency of the boot archive, and we could devise some simple way to make sure that those changes get committed to the archive. With volatile files in the archive, it's much more wide-open, and more exotic (and I think unlikely) schemes such as FEM would be needed. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
