On Thu, 6 Aug 2009, Hua wrote:

1. Due to the COW nature of zfs, files on zfs are more tender to be fragmented comparing to traditional file system. Is this statement correct?

Yes and no.  Fragmentation is a complex issue.

ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks.

ZFS storage pools are typically comprised of multiple "vdevs" and writes are distributed over these vdevs. This means that the first 128K of a file may go to the first vdev and the second 128K may go to the second vdev. It could be argued that this is a type of fragmentation but since all of the vdevs can be read at once (if zfs prefetch chooses to do so) the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data.

ZFS uses a slab allocator and allocates large contiguous chunks of from the vdev storage, and then carves the 128K blocks from those large chunks. This dramatically increases the probability that related data will be very close on the same disk.

ZFS delays ordinary writes to the very last minute according to these rules (my understanding): 7/8th total memory consumed, 5 seconds of 100% write I/O is collected, or 30 seconds has elapsed. Since quite a lot of data is written at once, zfs is able to write that data in the best possible order.

ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access.

ZFS absolutely observes synchronous write requests (e.g. by NFS or a database). The synchronous write requests do not benefit from the long write aggregation delay so the result may not be written as ideally as ordinary write requests. Recently zfs has added support for using a SSD as a synchronous write log, and this allows zfs to turn synchronous writes into more ordinary writes which can be written more intelligently while returning to the user with minimal latency.

Perhaps the most significant fragmentation concern for zfs is if the pool is allowed to become close to 100% full. Similar to other filesystems, the quality of the storage allocations goes downhill fast when the pool is almost 100% full, so even files written contiguously may be written in fragments.

3. Being a relative new file system, are there many adoption in large implementation?

There are indeed some sites which heavily use zfs. One very large site using zfs is archive.org.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to