First of all, thanks to all of you for contributing to this discussion. >From what I hear, it seems to me that the fatal state can be relaxed in this case and we can resync the files after boot. Only until we pass this test will Sun be able to certify some oracle products, so this is *very* urgent. What I would like to ask the alias, to focus a bit more the discussion is:
1) what is the easiest way to resync the files in single user mode so that the systems can boot again? (I'd hate to reinstall the OS) 2) if I move back to S10GA would I still see this problem? in other words, was this introduced in update 1? 3) what is the fastest/easiest way to relax this fatal state in a supported way? a patch? changes to smf configuration parameters in S10U1? again, thanks to all of you for helping, Fernando Shudong Zhou wrote On 02/15/06 09:16,: >The basic idea of booting from an archive solves more problems then >it introduces. It simplifies Xen, GRID deployment, HoneyComb, etc. >As far as resyncing the archive on shutdown, it's the minimal >approach. We can always add more syncing points where it makes >sense. > >I do think the boot-archive is overly agressive in reporting a fatal >failure if *any* file is out of date. The original intention was >to prevent potential data corruption related to mismatched kernel >modules. We should relax the failure mode such that fatal failure >is only reported if a selected subset of files are out of date. > >Shudong > > > >> <1139968260.3664.67.camel at thunk>Bill Sommerfeld writes: >> >> >>>On Tue, 2006-02-14 at 20:34, Jan Setje-Eilers wrote: >>> >>> >>>> The basic issue seems to be that the archive wasn't synced on that >>>>node. This sync normally happens automatically during any sort of >>>>orderly shutdown or reboot. >>>> >>>> >>>so, this aspect of the newboot design has always troubled me. >>> >>> >> It's not a design issue. There are some remaining gaps in the >>implementation which are being dealt with relative to their impact. -- >>Fernando's case may present new data here, if it does, we'll react to >>it. >> >> FWIW, It took me a very long time to finally understand and accept >>the design as sound. The fundamental bit is that these aren't files >>that are updated randomly or even regularly. The ones that are, config >>files like path_to_inst, actually don't need to be in sync, they just >>need to be re-synced after boot. >> >> >> >>>seems that, in production, many systems will be installed and configured >>>and will then stay up without reboot until something goes bump, either >>>with power or hardware; on reboot they'll fail to come back >>>automatically; as nothing except a clean shutdown updates the boot >>>archive, >>> >>> >> What you're assuming here is that the archive was out of date to >>begin with. It shouldn't have been unless you bounced the machine half >>way through updating files that live in the archive. I that case >>what's in the archive is likely a more consistent bet than what might >>be in the filesystem. >> >> Things like the devid cache causing the check to fail for no good >>reason should be fixed. Most of the configfiles that grow (like >>path_to_inst) can simply be merged and don't need to cause the check >>to fail unless they actually conflict. >> >> >> >>>in a intentional-reboots-are-rare environment (which we otherwise >>>engineer for!) chances are that the on-disk boot archive will be >>>wrong and approximately every failure-triggered reboot will notice a >>>boot archive mismatch.. >>> >>> >> Only if it's not up to date to begin with, which is a solvable >>problem. To be fair, getting this really right has been a lot of work >>so far and there are still a small number of (resolvable) open issues. >> >>-jan >> >> > > > -- <http://www.sun.com> * Fernando Castano * Staff engineer, MDE *Sun Microsystems, Inc.* 260 Constitution Drive Menlo Park, CA 94025 US Phone x88904/+1 650 786 8904 Email Fernando.Castano at Sun.COM -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/smf-discuss/attachments/20060215/8118656b/attachment.html>