>I'm real curious about this. For example, would upgrading to SQLite3 >help? Or is there a fundamental problem with SQLite2 that is not not >changed in 3? Or can SMF recover more intelligently? Or is this more >of a UFS reliability issue that ZFS boot will help with?
I have no idea what the underlying issues are and since it's a rare enough occurrence which always tend to happen at the moment that I really need a machine I have not quite done my bit to improve this bit of quality in Solaris which I admit I should have. Note that this only happens if the repository actually needs to be changed during boot. (Hm, now that I think about if I think it was triggered by a hang which just happened to be in parallel with manifest import) And I may not have power cycled but rather triggered a panic (I'm sure I did, now that I think about it). So it's not a power cycle, but a "panic & sync disk". That should give some form of on-disk consistency. >I recently accidentally pulled the power cord on my laptop while >shutting down. The battery was out (I leave it out when at home to >extend battery life). When the system came up a number of files were >corrupted, like /etc/inet/hosts, and fsck didn't catch that -- the >system just reached multi-user as if nothing had happened. I had to >manually fsck -y the filesystems, and that found a fixed lots of >problems, but not the corruption of /etc/inet/hosts. Yep, unfortunately the way UFS works is that it only cares about meta data and that's as far as consistency checking goes. Writing a file will typically go as follows: - allocate new blocks - update inode to point to new blocks - write new data to disk. That's why it's so easy to get files with bogus content with UFS. (the second step is synchronous and the third one depends on the willingness of the I/O scheduler and whether the disk actually completes all scheduled writes before the system resets) >How can SMF protect itself against UFS damage? I'd assume that's a task of the database engine; but there is more too it than that because SMF transactions also need to be grouped. (E.g., a manifest import is likely to consist of many database updates but in all is only a single transaction). Coincidentally, the recent putback which makes all manifest imports happen in memory may make this problem largely go away. >The way SQLite works it keeps a journal during transactions. The >journal provides both, locking and a way to rollback transactions that >are in progress when the writer dies (e.g., reboot). Perhaps this >approach + UFS unreliability just don't mix. But even if SQLite had a >ZFS-like COW approach to transactions internally it might still not mix >with UFS unreliability. > >> this is quite an important point given the fact that sshd is the >> only remote (login) service started by default. > >Putting sshd config into SMF won't make that worse. If the SMF >configuration repository is corrupted then sshd won't start, and that's >_today_. But putting it in their will make certain things worse: such as the familiarity with other OSes. The changes we have made to OpenSSH are by and large to make it fit well with Solaris crypto, PAM and auditing. Those features are largely transparent to admins. Removing /etc/ssh/sshd_config seems counter-productive, specifically considering that we do not offer a way to make SMF changes during install. (And I would suggest that installing a custom sshd_config file is something extremely likely to occur on custom jumpstarts) Casper