On Thu, 11 Oct 2012, Richard Elling wrote:

On Oct 11, 2012, at 2:58 PM, Phillip Wagstrom <phillip.wagst...@gmail.com> 

On Oct 11, 2012, at 4:47 PM, andy thomas wrote:

According to a Sun document called something like 'ZFS best practice' I read 
some time ago, best practice was to use the entire disk for ZFS and not to 
partition or slice it in any way. Does this advice hold good for FreeBSD as 

        My understanding of the best practice was that with Solaris prior to 
ZFS, it disabled the volatile disk cache.

This is not quite correct. If you use the whole disk ZFS will attempt to enable 
write cache. To understand why, remember that UFS (and ext, by default) can die 
horrible death (+fsck) if there is a power outage and cached data is not 
flushed to disk.
So by default, Sun shipped some disks with write cache disabled by default. For 
disks, they are most often shipped with write cache enabled and the most 
popular file
systems (NTFS) properly issue cache flush requests as needed (for the same 
reason ZFS
issues cache flush requests).

Out of interest, how do you enable the write cache on a disk? I recently replaced a failing Dell-branded disk on a Dell server with an HP-branded disk (both disks were the identical Seagate model) and on running the EFI diagnostics just to check all was well, it reported the write cache was disabled on the new HP disk but enabled on the remaining Dell disks in the server. I couldn't see any way of enabling the cache from the EFI diags so I left it as it was - probably not ideal.

With ZFS, the disk cache is used, but after every transaction a cache-flush 
command is issued to ensure that the data made it the platters.

Write cache is flushed after uberblock updates and for ZIL writes. This is 
important for
uberblock updates, so the uberblock doesn't point to a garbaged MOS. It is 
for ZIL writes, because they must be guaranteed written to media before ack.

Thanks for the explanation, that all makes sense now.


 If you slice the disk, enabling the disk cache for the whole disk is dangerous 
because other file systems (meaning UFS) wouldn't do the cache-flush and there 
was a risk for data-loss should the cache fail due to, say a power outage.
        Can't speak to how BSD deals with the disk cache.

I looked at a server earlier this week that was running FreeBSD 8.0 and had 2 x 
1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a spare. Large 
file I/O throughput was OK but the mail jail it hosted had periods when it was 
very slow with accessing lots of small files. All three disks (the two in the 
ZFS mirror plus the spare) had been partitioned with gpart so that partition 1 
was a 6 GB swap and partition 2 filled the rest of the disk and had a 
'freebsd-zfs' partition on it. It was these second partitions that were part of 
the mirror.

This doesn't sound like a very good idea to me as surelt disk seeks for swap 
and for ZFS file I/O are bound to clash. aren't they?

        It surely would make a slow, memory starved swapping system even 
slower.  :)

Another point about the Sun ZFS paper - it mentioned optimum performance would 
be obtained with RAIDz pools if the number of disks was between 3 and 9. So 
I've always limited my pools to a maximum of 9 active disks plus spares but the 
other day someone here was talking of seeing hundreds of disks in a single 
pool! So what is the current advice for ZFS in Solaris and FreeBSD?

        That number was drives per vdev, not per pool.

zfs-discuss mailing list



Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
zfs-discuss mailing list

Reply via email to