Re: [zfs-discuss] periodic slow responsiveness

James Lever Thu, 24 Sep 2009 15:46:35 -0700


On 25/09/2009, at 2:58 AM, Richard Elling wrote:

On Sep 23, 2009, at 10:00 PM, James Lever wrote:
So it turns out that the problem is that all writes coming via NFSare going through the slog. When that happens, the transfer speedto the device drops to ~70MB/s (the write speed of his SLC SSD) anduntil the load drops all new write requests are blocked causing anoticeable delay (which has been observed to be up to 20s, butgenerally only 2-4s).
Thank you sir, can I have another?
If you add (not attach) more slogs, the workload will be spreadacross them. But...


My log configurations is :

        logs
          c7t2d0s0   ONLINE       0     0     0
          c7t3d0s0   OFFLINE      0     0     0

I’m going to test the now removed SSD and see if I can get it toperform significantly worse than the first one, but my memory oftesting these at pre-production testing was that they were bothequally slow but not significantly different.

On a related note, I had 2 of these devices (both using just 10GBpartitions) connected as log devices (so the pool had 2 separatelog devices) and the second one was consistently runningsignificantly slower than the first. Removing the second devicemade an improvement on performance, but did not remove theoccasional observed pauses.
...this is not surprising, when you add a slow slog device. This isthe weakest link rule.

So, in theory, even if one of the two SSDs was even slightly slowerthan the other, it would just appear that it would be more heavilyeffected?

Here is part of what I’m not understanding - unless one SSD issignificantly worse than the other, how can the following scenario betrue? Here is some iostat output from the two slog devices at 1sintervals when it gets a large series of write requests.


Idle at start.

0.0 1462.0 0.0 187010.2 0.0 28.6 0.0 19.6 2 83 00 0 0 c7t2d00.0 233.0 0.0 29823.7 0.0 28.7 0.0 123.3 0 83 00 0 0 c7t3d0


NVRAM cache close to full. (256MB BBC)

0.0 84.0 0.0 10622.0 0.0 3.5 0.0 41.2 0 12 00 0 0 c7t2d00.0 0.0 0.0 0.0 0.0 35.0 0.0 0.0 0 100 00 0 0 c7t3d0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 00 0 0 c7t2d00.0 305.0 0.0 39039.3 0.0 35.0 0.0 114.7 0 100 00 0 0 c7t3d0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 00 0 0 c7t2d00.0 361.0 0.0 46208.1 0.0 35.0 0.0 96.8 0 100 00 0 0 c7t3d0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 00 0 0 c7t2d00.0 329.0 0.0 42114.0 0.0 35.0 0.0 106.3 0 100 00 0 0 c7t3d0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 00 0 0 c7t2d00.0 317.0 0.0 40449.6 0.0 27.4 0.0 86.5 0 85 00 0 0 c7t3d0

0.0 4.0 0.0 263.8 0.0 0.0 0.0 0.2 0 0 00 0 0 c7t2d00.0 4.0 0.0 367.8 0.0 0.0 0.0 0.3 0 0 00 0 0 c7t3d0

What determines the size of the writes or distribution between slogdevices? It looks like ZFS decided to send a large chunk to one slogwhich nearly filled the NVRAM, and then continue writing to the otherone, which meant that it had to go at device speed (whatever that isfor the data size/write size). Is there a way to tune the writes tomultiple slogs to be (for arguments sake) 10MB slices?

I was of the (mis)understanding that only metadata and writessmaller than 64k went via the slog device in the event of an O_SYNCwrite request?
The threshold is 32 kBytes, which is unfortunately the same as thedefault
NFS write size. See CR6686887
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887
If you have a slog and logbias=latency (default) then the writes goto the slog.So there is some interaction here that can affect NFS workloads inparticular.


Interesting CR.

nfsstat -m output on one of the linux hosts (ubuntu)

Flags:rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nointr,noacl,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.1.0.17,mountvers=3,mountproto=tcp,addr=10.1.0.17

rsize and wsize auto tuned to 1MB. How does this effect the syncrequest threshold?

The clients are (mostly) RHEL5.
Is there a way to tune this on the NFS server or clients such thatwhen I perform a large synchronous write, the data does not go viathe slog device?
You can change the IOP size on the client.



You’re suggesting modifying rsize/wsize?  or something else?

cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] periodic slow responsiveness

Reply via email to