On Mon, 9 Jan 2012, Edward Ned Harvey wrote:
I don't think that's correct...
But it is! :-)
Suppose you write a 1G file to disk. It is a database store. Now you start
running your db server. It starts performing transactions all over the
place. It overwrites the middle 4k of the file, and it overwrites 512b
somewhere else, and so on. Since this is COW, each one of these little
writes in the middle of the file will actually get mapped to unused sectors
of disk. Depending on how quickly they're happening, they may be aggregated
Oops. I see an error in the above. Other than tail blocks, or due to
compression, zfs will not write a COW data block smaller than the zfs
filesystem blocksize. If the blocksize was 128K, then updating just
one byte in that 128K block results in writing a whole new 128K block.
This is pretty significant write-amplification but the resulting
fragmentation is still limited by the 128K block size. Remember that
any fragmentation calculation needs to be based on the disk's minimum
read (i.e. sector) size.
However, it is worth remembering that it is common to set the block
size to a much smaller value than default (e.g. 8K) if the filesystem
is going to support a database. In that case it is possible for there
to be fragmentation for every 8K of data. The worst case
fragmentation pecentage for 8K blocks (and 512-byte sectors) is 6.25%
((100*1/((8*1024)/512))). That would be a high enough percentage that
Microsoft Windows defrag would recommend defragging the disk.
Metadata chunks can not be any smaller than the disk's sector size
(e.g. 512 bytes or 4K bytes). Metadata can be seen as contributing to
fragmentation, which is why it is so valuable to cache it. If the
metadata is not conveniently close to the data, then it may result in
a big ugly disk seek (same impact as data fragmentation) to read it.
In summary, with zfs's default 128K block size, data fragmentation is
not a significant issue, If the zfs filesystem block size is reduced
to a much smaller value (e.g. 8K) then it can become a significant
issue. As Richard Elling points out, a database layered on top of zfs
may already be fragmented by design.
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
zfs-discuss mailing list