Hi Tim, On Jun 14, 2012, at 12:20 PM, Timothy Coalson wrote:
> Thanks for the script. Here is some sample output from 'sudo > ./nfssvrtop -b 512 5' (my disks are 512B-sector emulated and the pool > is ashift=9, some benchmarking didn't show much difference with > ashift=12 other than giving up 8% of available space) during a copy > operation from 37.30 with sync=standard: > > 2012 Jun 14 13:59:13, load: 0.68, read: 0 KB, swrite: 0 > KB, awrite: 557056 KB > Ver Client NFSOPS Reads SWrites AWrites Commits Rd_bw SWr_bw > AWr_bw Rd_t SWr_t AWr_t Com_t Align% > 3 xxx.xxx.37.30 108 0 0 108 0 0 0 > 111206 0 0 396 1917419 100 > a bit later... > 3 xxx.xxx.37.30 109 0 0 108 0 0 0 > 111411 0 0 427 0 100 > > sample output from the end of 'zpool iostat -v 5 mainpool' concurrently: > logs - - - - - - > c31t3d0s0 260M 9.68G 0 1.21K 0 85.3M > c31t4d0s0 260M 9.68G 0 1.21K 0 85.1M > > In case the alignment fails, the nonzero entries are under NFSOPS, > AWrites, AWr_bw, AWr_t, Com_t and Align%. The Com_t (average commit > time?) column alternates between zero and a million or two (the other > columns stay about the same, the zeros stay zero), while the "Commits" > column stays zero during the copy. The write throughput to the logs > varies quite a bit, that sample is a very high mark, it mainly > alternates between almost zero and 30M each, which is kind of odd > considering the copy speed (using gigabit network, copy speed averages > around 110MB/s). The client is using async writes, that include commits. Sync writes do not need commits. What happens is that the ZFS transaction group commit occurs at more-or-less regular intervals, likely 5 seconds for more modern ZFS systems. When the commit occurs, any data that is in the ARC but not commited in a prior transaction group gets sent to the ZIL. This is why you might see a very different amount of ZIL activity relative to the expected write workload. > When I 'zfs set sync=disabled', the output of nfssrvtop stays about > the same, except the Com_t stays 0, and the log devices also stay 0 > for throughput. Could you enlighten me as to what "Com_t" measures > when "Commits" stays zero? Perhaps the nfs server caches asynchronous > nfs writes how I expect, but flushes its cache with synchronous > writes? > With sync=disabled, the ZIL is not used, thus the commit response to the client is a lie, breaking the covenant between the server and client. In other words, the server is supposed to respond to the commit only when the data is written to permanent media, but the administrator overruled this action by disabling the ZIL. If the server was to unexpectedly restart or other conditions occur such that the write cannot be completed, then the server and client will have different views of the data, a form of data loss. Different applications can react to long commit times differently. In this example, we see 1.9 seconds for the commit versus about 400 microseconds for each async write. The cause of the latency of the commit is not apparent from any bandwidth measurements (eg zpool iostat) and you should consider looking more closely at the "iostat -x" latency to see if the log devices are performing well. -- richard -- ZFS and performance consulting http://www.RichardElling.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss