Re: [zfs-discuss] zfs fragmentation

Bob Friesenhahn Fri, 07 Aug 2009 09:09:43 -0700

On Thu, 6 Aug 2009, Hua wrote:

1. Due to the COW nature of zfs, files on zfs are more tender to befragmented comparing to traditional file system. Is this statementcorrect?


Yes and no.  Fragmentation is a complex issue.

ZFS uses 128K data blocks by default whereas other filesystemstypically use 4K or 8K blocks. This naturally reduces the potentialfor fragmentation by 32X over 4k blocks.

ZFS storage pools are typically comprised of multiple "vdevs" andwrites are distributed over these vdevs. This means that the first128K of a file may go to the first vdev and the second 128K may go tothe second vdev. It could be argued that this is a type offragmentation but since all of the vdevs can be read at once (if zfsprefetch chooses to do so) the seek time for single-user contiguousaccess is essentially zero since the seeks occur while the applicationis already busy processing other data. When mirror vdevs are used,any device in the mirror may be used to read the data.

ZFS uses a slab allocator and allocates large contiguous chunks offrom the vdev storage, and then carves the 128K blocks from thoselarge chunks. This dramatically increases the probability thatrelated data will be very close on the same disk.

ZFS delays ordinary writes to the very last minute according to theserules (my understanding): 7/8th total memory consumed, 5 seconds of100% write I/O is collected, or 30 seconds has elapsed. Since quite alot of data is written at once, zfs is able to write that data in thebest possible order.

ZFS uses a copy-on-write model. Copy-on-write tends to causefragmentation if portions of existing files are updated. If a largeportion of a file is overwritten in a short period of time, the resultshould be reasonably fragment-free but if parts of the file areupdated over a long period of time (like a database) then the file iscertain to be fragmented. This is not such a big problem as itappears to be since such files were already typically accessed usingrandom access.

ZFS absolutely observes synchronous write requests (e.g. by NFS or adatabase). The synchronous write requests do not benefit from thelong write aggregation delay so the result may not be written asideally as ordinary write requests. Recently zfs has added supportfor using a SSD as a synchronous write log, and this allows zfs toturn synchronous writes into more ordinary writes which can be writtenmore intelligently while returning to the user with minimal latency.

Perhaps the most significant fragmentation concern for zfs is if thepool is allowed to become close to 100% full. Similar to otherfilesystems, the quality of the storage allocations goes downhill fastwhen the pool is almost 100% full, so even files written contiguouslymay be written in fragments.

3. Being a relative new file system, are there many adoption inlarge implementation?

There are indeed some sites which heavily use zfs. One very largesite using zfs is archive.org.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs fragmentation

Reply via email to