The basic idea of booting from an archive solves more problems then
it introduces. It simplifies Xen, GRID deployment, HoneyComb, etc.
As far as resyncing the archive on shutdown, it's the minimal
approach. We can always add more syncing points where it makes
sense.

I do think the boot-archive is overly agressive in reporting a fatal
failure if *any* file is out of date. The original intention was
to prevent potential data corruption related to mismatched kernel
modules. We should relax the failure mode such that fatal failure
is only reported if a selected subset of files are out of date.

Shudong

>  <1139968260.3664.67.camel at thunk>Bill Sommerfeld writes:
> > On Tue, 2006-02-14 at 20:34, Jan Setje-Eilers wrote:
> > >  The basic issue seems to be that the archive wasn't synced on that
> > > node. This sync normally happens automatically during any sort of
> > > orderly shutdown or reboot. 
> > 
> > so, this aspect of the newboot design has always troubled me.  
> 
>  It's not a design issue. There are some remaining gaps in the
> implementation which are being dealt with relative to their impact. --
> Fernando's case may present new data here, if it does, we'll react to
> it.
> 
>  FWIW, It took me a very long time to finally understand and accept
> the design as sound. The fundamental bit is that these aren't files
> that are updated randomly or even regularly. The ones that are, config
> files like path_to_inst, actually don't need to be in sync, they just
> need to be re-synced after boot.
> 
> > seems that, in production, many systems will be installed and configured
> > and will then stay up without reboot until something goes bump, either
> > with power or hardware; on reboot they'll fail to come back
> > automatically; as nothing except a clean shutdown updates the boot
> > archive,
> 
>  What you're assuming here is that the archive was out of date to
> begin with. It shouldn't have been unless you bounced the machine half
> way through updating files that live in the archive. I that case
> what's in the archive is likely a more consistent bet than what might
> be in the filesystem.
> 
>  Things like the devid cache causing the check to fail for no good
> reason should be fixed. Most of the configfiles that grow (like
> path_to_inst) can simply be merged and don't need to cause the check
> to fail unless they actually conflict.
> 
> > in a intentional-reboots-are-rare environment (which we otherwise
> > engineer for!) chances are that the on-disk boot archive will be
> > wrong and approximately every failure-triggered reboot will notice a
> > boot archive mismatch..
> 
>  Only if it's not up to date to begin with, which is a solvable
> problem. To be fair, getting this really right has been a lot of work
> so far and there are still a small number of (resolvable) open issues.
> 
> -jan


Reply via email to