<1139968260.3664.67.camel at thunk>Bill Sommerfeld writes:
> On Tue, 2006-02-14 at 20:34, Jan Setje-Eilers wrote:
> >  The basic issue seems to be that the archive wasn't synced on that
> > node. This sync normally happens automatically during any sort of
> > orderly shutdown or reboot. 
> 
> so, this aspect of the newboot design has always troubled me.  

 It's not a design issue. There are some remaining gaps in the
implementation which are being dealt with relative to their impact. --
Fernando's case may present new data here, if it does, we'll react to
it.

 FWIW, It took me a very long time to finally understand and accept
the design as sound. The fundamental bit is that these aren't files
that are updated randomly or even regularly. The ones that are, config
files like path_to_inst, actually don't need to be in sync, they just
need to be re-synced after boot.

> seems that, in production, many systems will be installed and configured
> and will then stay up without reboot until something goes bump, either
> with power or hardware; on reboot they'll fail to come back
> automatically; as nothing except a clean shutdown updates the boot
> archive,

 What you're assuming here is that the archive was out of date to
begin with. It shouldn't have been unless you bounced the machine half
way through updating files that live in the archive. I that case
what's in the archive is likely a more consistent bet than what might
be in the filesystem.

 Things like the devid cache causing the check to fail for no good
reason should be fixed. Most of the configfiles that grow (like
path_to_inst) can simply be merged and don't need to cause the check
to fail unless they actually conflict.

> in a intentional-reboots-are-rare environment (which we otherwise
> engineer for!) chances are that the on-disk boot archive will be
> wrong and approximately every failure-triggered reboot will notice a
> boot archive mismatch..

 Only if it's not up to date to begin with, which is a solvable
problem. To be fair, getting this really right has been a lot of work
so far and there are still a small number of (resolvable) open issues.

-jan

Reply via email to