Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-18 Thread Richard Elling
On Jan 18, 2013, at 4:40 AM, Jim Klimov  wrote:

> On 2013-01-18 06:35, Thomas Nau wrote:
 If almost all of the I/Os are 4K, maybe your ZVOLs should use a 
 volblocksize of 4K?  This seems like the most obvious improvement.
>>> 
>>> 4k might be a little small. 8k will have less metadata overhead. In some 
>>> cases
>>> we've seen good performance on these workloads up through 32k. Real pain
>>> is felt at 128k :-)
>> 
>> My only pain so far is the time a send/receive takes without really loading 
>> the
>> network at all. VM performance is nothing I worry about at all as it's 
>> pretty good.
>> So key question for me is if going from 8k to 16k or even 32k would have 
>> some benefit for
>> that problem?
> 
> I would guess that increasing the block size would on one hand improve
> your reads - due to more userdata being stored contiguously as part of
> one ZFS block - and thus sending of the backup streams should be more
> about reading and sending the data and less about random seeking.

There is too much caching in the datapath to make a broad statement stick.
Empirical measurements with your workload will need to choose the winner.

> On the other hand, this may likely be paid off with the need to do more
> read-modify-writes (when larger ZFS blocks are partially updated with
> the smaller clusters in the VM's filesystem) while the overall system
> is running and used for its primary purpose. However, since the guest
> FS is likely to store files of non-minimal size, it is likely that the
> whole larger backend block would be updated anyway...

For many ZFS implementations, RMW for zvols is the norm.

> 
> So, I think, this is something an experiment can show you - whether the
> gain during backup (and primary-job) reads vs. possible degradation
> during the primary-job writes would be worth it.
> 
> As for the experiment, I guess you can always make a ZVOL with different
> recordsize, DD data into it from the production dataset's snapshot, and
> attach the VM or its clone to the newly created clone of its disk image.

In my experience, it is very hard to recreate in the lab the environments
found in real life. dd, in particular, will skew the results a bit because it
is in LBA order for zvols, not the creation order as seen in the real world.

That said, trying to get high performance out of HDDs is an exercise like
fighting the tides :-)
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-18 Thread Richard Elling

On Jan 17, 2013, at 9:35 PM, Thomas Nau  wrote:

> Thanks for all the answers more inline)
> 
> On 01/18/2013 02:42 AM, Richard Elling wrote:
>> On Jan 17, 2013, at 7:04 AM, Bob Friesenhahn > > wrote:
>> 
>>> On Wed, 16 Jan 2013, Thomas Nau wrote:
>>> 
 Dear all
 I've a question concerning possible performance tuning for both iSCSI 
 access
 and replicating a ZVOL through zfs send/receive. We export ZVOLs with the
 default volblocksize of 8k to a bunch of Citrix Xen Servers through iSCSI.
 The pool is made of SAS2 disks (11 x 3-way mirrored) plus mirrored STEC 
 RAM ZIL
 SSDs and 128G of main memory
 
 The iSCSI access pattern (1 hour daytime average) looks like the following
 (Thanks to Richard Elling for the dtrace script)
>>> 
>>> If almost all of the I/Os are 4K, maybe your ZVOLs should use a 
>>> volblocksize of 4K?  This seems like the most obvious improvement.
>> 
>> 4k might be a little small. 8k will have less metadata overhead. In some 
>> cases
>> we've seen good performance on these workloads up through 32k. Real pain
>> is felt at 128k :-)
> 
> My only pain so far is the time a send/receive takes without really loading 
> the
> network at all. VM performance is nothing I worry about at all as it's pretty 
> good.
> So key question for me is if going from 8k to 16k or even 32k would have some 
> benefit for
> that problem?

send/receive can bottleneck on the receiving side. Take a look at the archives
searching for "mbuffer" as a method of buffering on the receive side. In a well
tuned system, the send will be from ARC :-)
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-18 Thread Timothy Coalson
On Fri, Jan 18, 2013 at 4:55 PM, Freddie Cash  wrote:

> On Thu, Jan 17, 2013 at 4:48 PM, Peter Blajev  wrote:
>
>> Right on Tim. Thanks. I didn't know that. I'm sure it's documented
>> somewhere and I should have read it so double thanks for explaining it.
>>
>
> When in doubt, always check the man page first:
> man zpool
>
> It's listed in the section on the "iostat" sub-command:
>  zpool iostat [-T d|u] [-v] [pool] ... [interval [count]]
>
>  Displays I/O statistics for the given pools. When given an
> interval,
>  the statistics are printed every interval seconds until Ctrl-C is
>  pressed. If no pools are specified, statistics for every pool in
> the
>  system is shown. If count is specified, the command exits after
> count
>  reports are printed.
>

To my eye, that doesn't actually explain what the output is, only how to
get it to repeat.  It seems to assume that one is familiar with iostat, and
expects this to work the same way.  So, can't really fault someone for
being confused by the output in this case (perhaps the manpage could use
some clarification).

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-18 Thread Freddie Cash
On Thu, Jan 17, 2013 at 4:48 PM, Peter Blajev  wrote:

> Right on Tim. Thanks. I didn't know that. I'm sure it's documented
> somewhere and I should have read it so double thanks for explaining it.
>

When in doubt, always check the man page first:
man zpool

It's listed in the section on the "iostat" sub-command:
 zpool iostat [-T d|u] [-v] [pool] ... [interval [count]]

 Displays I/O statistics for the given pools. When given an
interval,
 the statistics are printed every interval seconds until Ctrl-C is
 pressed. If no pools are specified, statistics for every pool in
the
 system is shown. If count is specified, the command exits after
count
 reports are printed.

:D

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-18 Thread Peter Blajev
Right on Tim. Thanks. I didn't know that. I'm sure it's documented
somewhere and I should have read it so double thanks for explaining it.


--
Peter Blajev
IT Manager, TAAZ Inc.
Office: 858-597-0512 x125


On Thu, Jan 17, 2013 at 4:18 PM, Timothy Coalson  wrote:

> On Thu, Jan 17, 2013 at 5:33 PM, Peter Wood wrote:
>
>>
>> The 'zpool iostat -v' output is uncomfortably static. The values of
>> read/write operations and bandwidth are the same for hours and even days.
>> I'd expect at least some variations between morning and night. The load on
>> the servers is different for sure. Any input?
>>
>>
> Without a repetition time parameter, zpool iostat will print exactly once
> and exit, and the output is an average from kernel boot to "now", just like
> iostat, this is why it seems so static.  If you want to know the activity
> over 5 second intervals, use something like "zpool iostat -v 5" (repeat
> every 5 seconds) and wait for the second and later blocks.  The second and
> later blocks are average from previous output until "now".  I generally use
> 5 second intervals to match the 5 second commit interval on my pools.
>
> Tim
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-18 Thread Jim Klimov

On 2013-01-18 06:35, Thomas Nau wrote:

If almost all of the I/Os are 4K, maybe your ZVOLs should use a volblocksize of 
4K?  This seems like the most obvious improvement.


4k might be a little small. 8k will have less metadata overhead. In some cases
we've seen good performance on these workloads up through 32k. Real pain
is felt at 128k :-)


My only pain so far is the time a send/receive takes without really loading the
network at all. VM performance is nothing I worry about at all as it's pretty 
good.
So key question for me is if going from 8k to 16k or even 32k would have some 
benefit for
that problem?


I would guess that increasing the block size would on one hand improve
your reads - due to more userdata being stored contiguously as part of
one ZFS block - and thus sending of the backup streams should be more
about reading and sending the data and less about random seeking.

On the other hand, this may likely be paid off with the need to do more
read-modify-writes (when larger ZFS blocks are partially updated with
the smaller clusters in the VM's filesystem) while the overall system
is running and used for its primary purpose. However, since the guest
FS is likely to store files of non-minimal size, it is likely that the
whole larger backend block would be updated anyway...

So, I think, this is something an experiment can show you - whether the
gain during backup (and primary-job) reads vs. possible degradation
during the primary-job writes would be worth it.

As for the experiment, I guess you can always make a ZVOL with different
recordsize, DD data into it from the production dataset's snapshot, and
attach the VM or its clone to the newly created clone of its disk image.

Good luck, and I hope I got Richard's logic right in that answer ;)
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss