On Tue, Jul 11, 2006 at 11:03:17PM -0400, David Abrahams wrote:
>      How can RAID-Z preserve transactional semantics when a single
>      FS block write requires writing to multiple physical devices?

ZFS uses a technique that's been used in databases for years: phase
trees.  First you write all subtrees that you're updating to disk (to
currently free space - this is the COW part), wait for them to sync,
then update the tree's root (the uberblock) in a 2-phase commit.

It doesn't matter if you're doing it to multiple independent disks,
or to multiple disks in a RAID-Z stripe.  The individual writes don't
need to be atomic.  Just the update to the root of the tree.

The other trick is that with RAID-Z, every logical filesystem block
(512B - 128KB) is it's own stripe with it's own parity.  So by writing a
new block, you're not messing up the parity of any old blocks.  See Jeff
Bonwick's block on RAID-Z to learn more:

    http://blogs.sun.com/roller/page/bonwick?entry=raid_z


--Bill
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to