On Mon, 9 Jan 2012, Edward Ned Harvey wrote:
I don't think that's correct...
But it is! :-)
Suppose you write a 1G file to disk. It is a database store. Now you start running your db server. It starts performing transactions all over the place. It overwrites the middle 4k of the file, and it overwrites 512b somewhere else, and so on. Since this is COW, each one of these little writes in the middle of the file will actually get mapped to unused sectors of disk. Depending on how quickly they're happening, they may be aggregated
Oops. I see an error in the above. Other than tail blocks, or due to compression, zfs will not write a COW data block smaller than the zfs filesystem blocksize. If the blocksize was 128K, then updating just one byte in that 128K block results in writing a whole new 128K block. This is pretty significant write-amplification but the resulting fragmentation is still limited by the 128K block size. Remember that any fragmentation calculation needs to be based on the disk's minimum read (i.e. sector) size.
However, it is worth remembering that it is common to set the block size to a much smaller value than default (e.g. 8K) if the filesystem is going to support a database. In that case it is possible for there to be fragmentation for every 8K of data. The worst case fragmentation pecentage for 8K blocks (and 512-byte sectors) is 6.25% ((100*1/((8*1024)/512))). That would be a high enough percentage that Microsoft Windows defrag would recommend defragging the disk.
Metadata chunks can not be any smaller than the disk's sector size (e.g. 512 bytes or 4K bytes). Metadata can be seen as contributing to fragmentation, which is why it is so valuable to cache it. If the metadata is not conveniently close to the data, then it may result in a big ugly disk seek (same impact as data fragmentation) to read it.
In summary, with zfs's default 128K block size, data fragmentation is not a significant issue, If the zfs filesystem block size is reduced to a much smaller value (e.g. 8K) then it can become a significant issue. As Richard Elling points out, a database layered on top of zfs may already be fragmented by design.
Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list email@example.com http://mail.opensolaris.org/mailman/listinfo/zfs-discuss