On Thu, 6 Aug 2009, Hua wrote:
1. Due to the COW nature of zfs, files on zfs are more tender to be
fragmented comparing to traditional file system. Is this statement
correct?
Yes and no. Fragmentation is a complex issue.
ZFS uses 128K data blocks by default whereas other filesystems
typically use 4K or 8K blocks. This naturally reduces the potential
for fragmentation by 32X over 4k blocks.
ZFS storage pools are typically comprised of multiple "vdevs" and
writes are distributed over these vdevs. This means that the first
128K of a file may go to the first vdev and the second 128K may go to
the second vdev. It could be argued that this is a type of
fragmentation but since all of the vdevs can be read at once (if zfs
prefetch chooses to do so) the seek time for single-user contiguous
access is essentially zero since the seeks occur while the application
is already busy processing other data. When mirror vdevs are used,
any device in the mirror may be used to read the data.
ZFS uses a slab allocator and allocates large contiguous chunks of
from the vdev storage, and then carves the 128K blocks from those
large chunks. This dramatically increases the probability that
related data will be very close on the same disk.
ZFS delays ordinary writes to the very last minute according to these
rules (my understanding): 7/8th total memory consumed, 5 seconds of
100% write I/O is collected, or 30 seconds has elapsed. Since quite a
lot of data is written at once, zfs is able to write that data in the
best possible order.
ZFS uses a copy-on-write model. Copy-on-write tends to cause
fragmentation if portions of existing files are updated. If a large
portion of a file is overwritten in a short period of time, the result
should be reasonably fragment-free but if parts of the file are
updated over a long period of time (like a database) then the file is
certain to be fragmented. This is not such a big problem as it
appears to be since such files were already typically accessed using
random access.
ZFS absolutely observes synchronous write requests (e.g. by NFS or a
database). The synchronous write requests do not benefit from the
long write aggregation delay so the result may not be written as
ideally as ordinary write requests. Recently zfs has added support
for using a SSD as a synchronous write log, and this allows zfs to
turn synchronous writes into more ordinary writes which can be written
more intelligently while returning to the user with minimal latency.
Perhaps the most significant fragmentation concern for zfs is if the
pool is allowed to become close to 100% full. Similar to other
filesystems, the quality of the storage allocations goes downhill fast
when the pool is almost 100% full, so even files written contiguously
may be written in fragments.
3. Being a relative new file system, are there many adoption in
large implementation?
There are indeed some sites which heavily use zfs. One very large
site using zfs is archive.org.
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss