Re: [zfs-discuss] periodic slow responsiveness

James Lever Fri, 25 Sep 2009 14:25:46 -0700


On 26/09/2009, at 1:14 AM, Ross Walker wrote:

By any chance do you have copies=2 set?

No, only 1. So the double data going to the slog (as reported byiostat) is still confusing me and clearly potentially causingsignificant harm to my performance.

Also, try setting zfs_write_limit_override equal to the size of the
NVRAM cache (or half depending on how long it takes to flush):

echo zfs_write_limit_override/W0t268435456 | mdb -kw

That’s an interesting concept. All data still appears to go via theslog device, however, under heavy load my responsive to a new write istypically below 2s (a few outliers at about 3.5s) and a read(directory listing of a non-cached entry) is about 2s.

What will this do once it hits the limit? Will streaming writes nowbe sent directly to a txg and streamed to the primary storagedevices? (that is what I would like to see happen).

As a side an slog device will not be too beneficial for large
sequential writes, because it will be throughput bound not latency
bound. slog devices really help when you have lots of small sync
writes. A RAIDZ2 with the ZIL spread across it will provide much
higher throughput then an SSD. An example of a workload that benefits
from an slog device is ESX over NFS, which does a COMMIT for each
block written, so it benefits from an slog, but a standard media
server will not (but an L2ARC would be beneficial).

Better workload analysis is really what it is about.

It seems that it doesn’t matter what the workload is if the NFS pipecan sustain more continuous throughput the slog chain can support.

I suppose some creative use of the logbias setting might assist thissituation and force all potentially heavy writers directly to theprimary storage. This would, however, negate any benefit for having afast, low latency device for those filesystems for the times when itis desirable (any large batch of small writes, for example).

Is there a way to have a dynamic, auto logbias type setting dependingon the transaction currently presented to the server such that if itis clearly a large streaming write it gets treated aslogbias=throughput and if it is a small transaction it gets treated aslogbias=latency? (i.e. such that NFS transactions can be effectivelytreated as if it was local storage but minorly breaking the benefitsof the txg scheduling).


On 26/09/2009, at 3:39 AM, Richard Elling wrote:

Back of the envelope math says:
        10 Gbe = ~1 GByte/sec of I/O capacity

If the SSD can only sink 70 MByte/s, then you will need:
        int(1000/70) + 1 = 15 SSDs for the slog

For capacity, you need:
        1 GByte/sec * 30 sec = 30 GBytes

Ross' idea has merit, if the size of the NVRAM in the array is 30GBytes

or so.

At this point, enter the fusionIO cards or similar devices.Unfortunately there does not seem to be anything on the market withinfinitely fast write capacity (memory speeds) that is also supportedunder OpenSolaris as a slog device.

I think this is precisely what I (and anybody running a generalpurpose NFS server) need for a general purpose slog device.

Both of the above assume there is lots of memory in the server.
This is increasingly becoming easier to do as the memory costs
come down and you can physically fit 512 GBytes in a 4u server.
By default, the txg commit will occur when 1/8 of memory is used
for writes. For 30 GBytes, that would mean a main memory of only
240 Gbytes... feasible for modern servers.

However, most folks won't stomach 15 SSDs for slog or 30 GBytes of
NVRAM in their arrays. So Bob's recommendation of reducing the
txg commit interval below 30 seconds also has merit.  Or, to put it
another way, the dynamic sizing of the txg commit interval isn't
quite perfect yet. [Cue for Neil to chime in... :-)]

How does reducing the txg commit interval really help? WIll data nolonger go via the slog once it is streaming to disk? or will datastill all be pushed through the slog regardless?

For a predominantly NFS server purpose, it really looks like a case ofthe slog has to outperform your main pool for continuous write speedas well as an instant response time as the primary criterion. Whichmight as well be a fast (or group of fast) SSDs or 15kRPM drives withsome NVRAM in front of them.

Is there also a way to throttle synchronous writes to the slogdevice? Much like the ZFS write throttling that is alreadyimplemented, so that there is a gap for new writers to enter whenwriting to the slog device? (or is this the norm and includes slogwrites?)


cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] periodic slow responsiveness

Reply via email to