Hi Tim,

On Jun 14, 2012, at 12:20 PM, Timothy Coalson wrote:

> Thanks for the script.  Here is some sample output from 'sudo
> ./nfssvrtop -b 512 5' (my disks are 512B-sector emulated and the pool
> is ashift=9, some benchmarking didn't show much difference with
> ashift=12 other than giving up 8% of available space) during a copy
> operation from 37.30 with sync=standard:
> 2012 Jun 14 13:59:13, load: 0.68, read: 0        KB, swrite: 0
> KB, awrite: 557056   KB
> Ver     Client           NFSOPS   Reads SWrites AWrites Commits Rd_bw  SWr_bw 
>  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
> 3       xxx.xxx.37.30       108       0       0     108       0    0       0  
> 111206       0       0     396 1917419     100
> a bit later...
> 3       xxx.xxx.37.30       109       0       0     108       0    0       0  
> 111411       0       0     427       0     100
> sample output from the end of 'zpool iostat -v 5 mainpool' concurrently:
> logs                           -      -      -      -      -      -
>  c31t3d0s0                 260M  9.68G      0  1.21K      0  85.3M
>  c31t4d0s0                 260M  9.68G      0  1.21K      0  85.1M
> In case the alignment fails, the nonzero entries are under NFSOPS,
> AWrites, AWr_bw, AWr_t, Com_t and Align%.  The Com_t (average commit
> time?) column alternates between zero and a million or two (the other
> columns stay about the same, the zeros stay zero), while the "Commits"
> column stays zero during the copy.  The write throughput to the logs
> varies quite a bit, that sample is a very high mark, it mainly
> alternates between almost zero and 30M each, which is kind of odd
> considering the copy speed (using gigabit network, copy speed averages
> around 110MB/s).

The client is using async writes, that include commits. Sync writes do not
need commits.

What happens is that the ZFS transaction group commit occurs at more-or-less
regular intervals, likely 5 seconds for more modern ZFS systems. When the 
commit occurs, any data that is in the ARC but not commited in a prior 
group gets sent to the ZIL. This is why you might see a very different amount of
ZIL activity relative to the expected write workload.

> When I 'zfs set sync=disabled', the output of nfssrvtop stays about
> the same, except the Com_t stays 0, and the log devices also stay 0
> for throughput.  Could you enlighten me as to what "Com_t" measures
> when "Commits" stays zero?  Perhaps the nfs server caches asynchronous
> nfs writes how I expect, but flushes its cache with synchronous
> writes?

With sync=disabled, the ZIL is not used, thus the commit response to the client
is a lie, breaking the covenant between the server and client. In other words, 
the server is supposed to respond to the commit only when the data is written
to permanent media, but the administrator overruled this action by disabling
the ZIL. If the server was to unexpectedly restart or other conditions occur
such that the write cannot be completed, then the server and client will have
different views of the data, a form of data loss.

Different applications can react to long commit times differently. In this 
we see 1.9 seconds for the commit versus about 400 microseconds for each 
async write. The cause of the latency of the commit is not apparent from any
bandwidth measurements (eg zpool iostat) and you should consider looking 
more closely at the "iostat -x" latency to see if the log devices are performing
 -- richard


ZFS and performance consulting

zfs-discuss mailing list

Reply via email to