(earlier messages in this thread should be visible in a day or so at
http://www.opensolaris.org/os/community/arc/caselog/2006/525/mail;
sorry for the confusion).

On Wed, 2007-08-01 at 16:15 -0700, Jan Setje-Eilers wrote:
> So you prefer to load corrupt data in the case of a reboot soon 
> after an update to an archive check warning?

Not corrupt data, slightly old data.

I realize that there are lingering nightmares of boots interrupted by
corruption caused by a long-standing bug in logging UFS, but that bug --
4782952 -- was fixed late in s10.

During s10 development I had to manually recover a bunch of systems,
typically due to munched md.conf files.  logging ufs before the fix to
4782952 permitted blocks belonging to a file stably stored to be freed
and then overwritten before the transaction which freed the block
committed.

(now, as is typical for changes to solaris UFS, that fix needed some
followup work, but as best I can tell, things have damped out..)

I saw boot failures on mirrored root sparc systems on a regular basis
before this bug was fixed and haven't seen them since.  I don't want to
start seeing them again if this project integrates.

> > In the case of ZFS root, my understanding is that the worst that can
> > happen if we don't commit the intent log before reading is that we will
> > read /etc/system contents which doesn't contain edits made during the
> > last few seconds before a crash.  
> 
>  I can't confirm or deny that,

You don't need to; Neil Perrin confirmed this recently; see the mail log
for 2007/171 (ZFS Separate Intent Log).  I asked:

> As I understand it, loss of the information in the intent log means that
> the last few seconds of changes to a pool have been lost, but the pool
> is otherwise intact.

His response was "True"

> but even if that's the case you then have no way to know that the old copy 
> was loaded.

I'm not sure that's a problem -- you're booting a point-in-time
consistent config, just not necessarily the up-to-the-millisecond config
at the time of the crash.

Now, that's not good enough if we crash in the middle of a pkgadd -- but
then the boot archive doesn't help very much then, either (because once
we come up and discard the boot archive we may still load a mix of old
and new kernel modules)

For cases where there isn't a need for consistent updates to multiple
files we should be fine.

> > If we get /etc/system out of the boot archive, it may be months out of
> > date.
> 
>  In which case we catch this when the archive contents is verified.

And essentially crash/hang until an expert comes along to rescue the
system, which is IMHO unacceptable behavior.

> > >  It's also potentially very unsafe to do so due to the log issue.
> > 
> > Huh?  Not with zfs root -- the on-disk state will be self-consistent as
> > of the last time an uberblock update committed -- at most a few seconds
> > old.
> 
>  But it is with ufs. 

As best as I can tell, it's not been unsafe for ufs since 4782952 was
fixed.  I saw lots of problems on pre-FCS s10, but I've never seen
problems of lufs corruption breaking boot on sparc systems running s10
FCS or nevada.

>  If they aren't in the archive, then their state would have to be
> managed to ensure that they aren't unrolled or uncommitted.

But that's not how (working) lufs and zfs actually work.  If the code
which updates these files does the usual copy-edit-fsync-rename dance,
there should never be a window where the on-disk structure even
*without* the log contains something other than either the old or the
new version of the file.



Reply via email to