On Thu, 11 Oct 2012, Richard Elling wrote:
On Oct 11, 2012, at 2:58 PM, Phillip Wagstrom <phillip.wagst...@gmail.com>
On Oct 11, 2012, at 4:47 PM, andy thomas wrote:
According to a Sun document called something like 'ZFS best practice' I read
some time ago, best practice was to use the entire disk for ZFS and not to
partition or slice it in any way. Does this advice hold good for FreeBSD as
My understanding of the best practice was that with Solaris prior to
ZFS, it disabled the volatile disk cache.
This is not quite correct. If you use the whole disk ZFS will attempt to enable
write cache. To understand why, remember that UFS (and ext, by default) can die
horrible death (+fsck) if there is a power outage and cached data is not
flushed to disk.
So by default, Sun shipped some disks with write cache disabled by default. For
disks, they are most often shipped with write cache enabled and the most
systems (NTFS) properly issue cache flush requests as needed (for the same
issues cache flush requests).
Out of interest, how do you enable the write cache on a disk? I recently
replaced a failing Dell-branded disk on a Dell server with an HP-branded
disk (both disks were the identical Seagate model) and on running the EFI
diagnostics just to check all was well, it reported the write cache was
disabled on the new HP disk but enabled on the remaining Dell disks in the
server. I couldn't see any way of enabling the cache from the EFI diags so
I left it as it was - probably not ideal.
With ZFS, the disk cache is used, but after every transaction a cache-flush
command is issued to ensure that the data made it the platters.
Write cache is flushed after uberblock updates and for ZIL writes. This is
uberblock updates, so the uberblock doesn't point to a garbaged MOS. It is
for ZIL writes, because they must be guaranteed written to media before ack.
Thanks for the explanation, that all makes sense now.
If you slice the disk, enabling the disk cache for the whole disk is dangerous
because other file systems (meaning UFS) wouldn't do the cache-flush and there
was a risk for data-loss should the cache fail due to, say a power outage.
Can't speak to how BSD deals with the disk cache.
I looked at a server earlier this week that was running FreeBSD 8.0 and had 2 x
1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a spare. Large
file I/O throughput was OK but the mail jail it hosted had periods when it was
very slow with accessing lots of small files. All three disks (the two in the
ZFS mirror plus the spare) had been partitioned with gpart so that partition 1
was a 6 GB swap and partition 2 filled the rest of the disk and had a
'freebsd-zfs' partition on it. It was these second partitions that were part of
This doesn't sound like a very good idea to me as surelt disk seeks for swap
and for ZFS file I/O are bound to clash. aren't they?
It surely would make a slow, memory starved swapping system even
Another point about the Sun ZFS paper - it mentioned optimum performance would
be obtained with RAIDz pools if the number of disks was between 3 and 9. So
I've always limited my pools to a maximum of 9 active disks plus spares but the
other day someone here was talking of seeing hundreds of disks in a single
pool! So what is the current advice for ZFS in Solaris and FreeBSD?
That number was drives per vdev, not per pool.
zfs-discuss mailing list
Time Domain Systems
Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
zfs-discuss mailing list