Hi guys,
There seems to have been some discussion about this before
(http://mail.opensolaris.org/pipermail/zfs-discuss/2006-September/013050.html)
but I don't *quite* understand why the roundup is necessary.
Using Bill's notation, if there isn't a roundup (writing 4k fs blocks
to a 4 device RAID-Z) wouldn't you get something like this:
Disk 0 1 2 3
--------------------
LBA 0 A. A A A
1 A. A A A
2 A. A A B.
3 B B B B.
4 B B B B.
5 B B C. C
etc.
Where writing the 4k of B blocks starts at offset 5.5k and finishes at 9k.
i.e for B (as seen in vdev_raidz_map_alloc):
f = 11 % 4 == column 3 (0 indexed)
q = 8 / (3) == 2
r = 8 - (2 * (4-1)) == 2
bc = 2 + 1 == 3
Then the for loop the first 3 columns (starting from column 3 and
wrapping) will be marked as big columns (storing an extra block each)
and the final column, column 2 will only store 2 blocks... The
map_alloc function can then set the asize to be 5.5k and a subsequent
writes of C can start at offset 11. Can anyone explain what it is I'm
missing?
Many thanks,
James