Re: [zfs-discuss] Re: Transactional RAID-Z?

David Dyer-Bennet Wed, 12 Jul 2006 08:12:53 -0700

Jeff Bonwick <[EMAIL PROTECTED]> writes:

> > Since transactions in ZFS are committed until the ueberblock is written,
> > this boils down to:
> > 
> >     "How is the ueberblock committed atomically in a RAID-Z configuration?"
> 
> RAID-Z isn't even necessary to have this issue; all you need is a disk
> that doesn't actually guarantee atomicity of single-sector writes.
> Which, of course, we have to cope with.
> 
> The key is that there's actually a ring of 128 uberblocks, indexed
> by transaction group number (mod 128).  When we open a storage pool,
> we read every uberblock; among those that have a valid SHA-256
> checksum, we take the one with the highest transaction group (txg).
> That will be, by definition, the uberblock for the last txg that
> successfully committed to disk.
> 
> If we lose power in the middle of writing an uberblock, then that
> uberblock won't checksum, so we'll use the one from the previous txg,
> i.e. the last one that synced completely.


I *love* information!  

I'm not a DB or filesystem engineer, so much of this is not stuff I
work with every day; and I've always wondered how people got around
issues like power failures and other uncontrollable shutdowns really
reliably and cleanly.  I think this is a way I haven't read about
before, and it makes perfect sense and seems fairly cheap (you only
have to look at all 128 on startup).
-- 
David Dyer-Bennet, <mailto:[EMAIL PROTECTED]>, <http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://dd-b.lighthunters.net/> <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Transactional RAID-Z?

Reply via email to