On 25/09/2009, at 2:58 AM, Richard Elling wrote:
On Sep 23, 2009, at 10:00 PM, James Lever wrote:
So it turns out that the problem is that all writes coming via NFS
are going through the slog. When that happens, the transfer speed
to the device drops to ~70MB/s (the write speed of his SLC SSD) and
until the load drops all new write requests are blocked causing a
noticeable delay (which has been observed to be up to 20s, but
generally only 2-4s).
Thank you sir, can I have another?
If you add (not attach) more slogs, the workload will be spread
across them. But...
My log configurations is :
logs
c7t2d0s0 ONLINE 0 0 0
c7t3d0s0 OFFLINE 0 0 0
I’m going to test the now removed SSD and see if I can get it to
perform significantly worse than the first one, but my memory of
testing these at pre-production testing was that they were both
equally slow but not significantly different.
On a related note, I had 2 of these devices (both using just 10GB
partitions) connected as log devices (so the pool had 2 separate
log devices) and the second one was consistently running
significantly slower than the first. Removing the second device
made an improvement on performance, but did not remove the
occasional observed pauses.
...this is not surprising, when you add a slow slog device. This is
the weakest link rule.
So, in theory, even if one of the two SSDs was even slightly slower
than the other, it would just appear that it would be more heavily
effected?
Here is part of what I’m not understanding - unless one SSD is
significantly worse than the other, how can the following scenario be
true? Here is some iostat output from the two slog devices at 1s
intervals when it gets a large series of write requests.
Idle at start.
0.0 1462.0 0.0 187010.2 0.0 28.6 0.0 19.6 2 83 0
0 0 0 c7t2d0
0.0 233.0 0.0 29823.7 0.0 28.7 0.0 123.3 0 83 0
0 0 0 c7t3d0
NVRAM cache close to full. (256MB BBC)
0.0 84.0 0.0 10622.0 0.0 3.5 0.0 41.2 0 12 0
0 0 0 c7t2d0
0.0 0.0 0.0 0.0 0.0 35.0 0.0 0.0 0 100 0
0 0 0 c7t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c7t2d0
0.0 305.0 0.0 39039.3 0.0 35.0 0.0 114.7 0 100 0
0 0 0 c7t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c7t2d0
0.0 361.0 0.0 46208.1 0.0 35.0 0.0 96.8 0 100 0
0 0 0 c7t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c7t2d0
0.0 329.0 0.0 42114.0 0.0 35.0 0.0 106.3 0 100 0
0 0 0 c7t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c7t2d0
0.0 317.0 0.0 40449.6 0.0 27.4 0.0 86.5 0 85 0
0 0 0 c7t3d0
0.0 4.0 0.0 263.8 0.0 0.0 0.0 0.2 0 0 0
0 0 0 c7t2d0
0.0 4.0 0.0 367.8 0.0 0.0 0.0 0.3 0 0 0
0 0 0 c7t3d0
What determines the size of the writes or distribution between slog
devices? It looks like ZFS decided to send a large chunk to one slog
which nearly filled the NVRAM, and then continue writing to the other
one, which meant that it had to go at device speed (whatever that is
for the data size/write size). Is there a way to tune the writes to
multiple slogs to be (for arguments sake) 10MB slices?
I was of the (mis)understanding that only metadata and writes
smaller than 64k went via the slog device in the event of an O_SYNC
write request?
The threshold is 32 kBytes, which is unfortunately the same as the
default
NFS write size. See CR6686887
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887
If you have a slog and logbias=latency (default) then the writes go
to the slog.
So there is some interaction here that can affect NFS workloads in
particular.
Interesting CR.
nfsstat -m output on one of the linux hosts (ubuntu)
Flags:
rw
,vers
=
3
,rsize
=
1048576
,wsize
=
1048576
,namlen
=
255
,hard
,nointr
,noacl
,proto
=
tcp
,timeo
=
600
,retrans
=2,sec=sys,mountaddr=10.1.0.17,mountvers=3,mountproto=tcp,addr=10.1.0.17
rsize and wsize auto tuned to 1MB. How does this effect the sync
request threshold?
The clients are (mostly) RHEL5.
Is there a way to tune this on the NFS server or clients such that
when I perform a large synchronous write, the data does not go via
the slog device?
You can change the IOP size on the client.
You’re suggesting modifying rsize/wsize? or something else?
cheers,
James
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss