I apologize if this is the wrong place to ask this. I looked at the archives for both zfs-code and zfs-discuss, and this seemed like the more appropriate list to post my query.
I recently read about ZFS and it seems to be a very cool thing. I've been reading various webpages and looking through the source code, and I think I have a pretty good handle on the basics -- the object directory, the whole vdev mirror/stripe/raidz setup, snapshots and clones, etc. I believe I even have a handle on the metaslab allocator, to a limited degree. Most of this stuff is apparent from various blogs, and http://www.opensolaris.org/os/community/zfs/source/, but there are a few things that aren't fully clear to me, mostly to do with the ZIO subsystem. I can fully appreciate the 'the source is the documentation' rule, but the lack of comments sometimes makes it really hard to figure out what's going on. 1. zio.c has functions like "zio_rewrite" and "zio_rewrite_gang_members". ZFS is copy-on-write, so it should never be rewriting anything, right? Also, zio_write_compress makes a cryptic reference to spa_sync. 2. Gang Blocks: While not explicitly spelled out anywhere (except maybe the source code), it seems to me that the behavior is this: system needs to write a 128KB block, but can't allocate a contiguous 128KB (in which case, you've got issues), so it allocates two 64KB blocks and a 'gang block' to point to them. When somebody tries to read back the original 128KB block, the ZIO subsystem reads the two 64KB halves and pieces them back together -- and the upper layers of code are none the wiser. Is this correct? 3. Gang Blocks II: Can a gang block point to other gang blocks? My guess is no. 4. Gang Blocks III: If a gang block contains up to 3 pointers (according to the 'on-disk format' doc) and it *cannot* point to other gang blocks, does that mean that ZIO can split a block into at most 3 pieces? 5. spa_sync has a loop with the comment "Iterate to convergence". I was under the impression that the sync operation just made sure all outstanding writes were committed to disk. How is committing that data to disk going to change that data? -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE