[zfs-discuss] Clarifications wanted for ZFS spec

Jim Klimov Wed, 11 Jan 2012 10:32:09 -0800

I'm reading the "ZFS On-disk Format" PDF (dated 2006 -
are there newer releases?), and have some questions
regarding whether it is outdated:


1) On page 16 it has the following phrase (which I think
is in general invalid):
  The value stored in offset is the offset in terms of
  sectors (512 byte blocks). To find the physical block
  byte offset from the beginning of a slice, the value
  inside offset must be shifted over (<<) by 9 (2^9=512)
  and this value must be added to 0x400000 (size of two
  vdev_labels and boot block).

Does this calculation really go on in hard-coded 2^9
values, or in VDEV-dependant ashift values (i.e. 2^12
for 4k disks, 2^10 for default raidz, etc.)?

2) Likewise, in Section 2.6 (block size entries) the
values of lsize/psize/asize are said to be represented
by the number of 512-byte sectors. Does this statement
hold true for ashift!=9 VDEVs/pools as well?

3) In Section 1.3 they discuss the format of VDEV labels.
As I'm researching this with the intent of repairing my
pool's label (core problem posted yesterday in thread
"Doublefree/doubledelete"), I wondered if the labels
are protected by any checksums. The document does not
state anything about it, so I guess the labels are only
protected by 4-way redundancy - that's it?..

4) As I asked today in thread "How many rollback TXGs
in a ring for 4k drives?", there was an understanding
by our Linux-ZFS comrades that each uberblock takes up
some amount of disk blocks, with minimal allocation
based on ashift value; thus on ashift=12 pools there
are only 32 rollback TXGs.

The PDF spec (section 1.3 overview) states that each
UB entry size is 1KB as part of the label structure;
does this mean that for ashift=12 pools there are 128
entries as well? If this is the case, I think the
Linux guys should be informed, to avoid incompatible
implementations ;)

5) The label contains an NVList of "related" VDEVs...
does this factually limit the amount of devices which
can comprise a (top-level) VDEV?

I have seen some blog entry (Eric Schrock's, I think)
where the author discussed the initial graph-based
VDEV indexation, with each VDEV referring to about
3 neighbors; during import/scan it was possible to
either find all devices or deduce that some (and which)
are missing. But due to some drawbacks of that ASCII
based implementation they moved to NVLists.

I wonder if such required-device interpolation is
done now, or there can be as many VDEVs as can fit
into the 112KB of NVList size?

That's about all I can ask for teh first 10 pages
of spec text ;)

Thanks,
//Jim Klimov


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Clarifications wanted for ZFS spec

Reply via email to