Re: [zfs-discuss] SATA disk perf question

2011-06-03 Thread Eric D. Mudama

On Thu, Jun  2 at 20:49, Erik Trimble wrote:
Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't 
going to be able to do more than 200 under ideal conditions, and 
should be able to manage 50 under anything other than the 
pedantically worst-case situation. That's only about a 50% deviation, 
not like an order of magnitude or so.


Most cache-enabled 7200RPM drives can do 20K+ sequential IOPS at small
block sizes, up close to their peak transfer rate.  


For random IO, I typically see 80 IOPS for unqueued reads, 120 for
queued reads/writes with cache disabled, and maybe 150-200 for cache
enabled writes.  The above are all full-stroke, so the average seek is
1/3 stroke (unqueued).  On a smaller data set where the drive dwarfs
the data set, average seek distance is much shorter and the resulting
IOPS can be quite a bit higher.

--eric

--
Eric D. Mudama
edmud...@bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-03 Thread Eric Sproul
On Fri, Jun 3, 2011 at 11:22 AM, Paul Kraus  wrote:
> So is there a way to read these real I/Ops numbers ?
>
> iostat is reporting 600-800 I/Ops peak (1 second sample) for these
> 7200 RPM SATA drives. If the drives are doing aggregation, then how to
> tell what is really going on ?

I've always assumed that crazy high IOPS numbers on 7.2k drives means
I'm seeing the individual drive caches absorbing those writes.  That's
the first place those writes will "land" when coming in from the disk
controller.  As other posters have said, after that the drive may
internally reorder and/or aggregate those writes before sending them
to the platter.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-03 Thread Paul Kraus
On Thu, Jun 2, 2011 at 11:49 PM, Erik Trimble  wrote:
> On 6/2/2011 5:12 PM, Jens Elkner wrote:
>>
>> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:
>>>
>>> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
>>
>>> Here's how you calculate (average) how long a random IOPs takes:
>>> seek time + ((60 / RPMs) / 2))]
>>>
>>> A truly sequential IOPs is:
>>> (60 / RPMs) / 2)
>>>
>>> For that series of drives, seek time averages 8.5ms (per Seagate).
>>> So, you get
>>>
>>> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
>>> IOPS
>>> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
>>>
>>> Note that due to averaging, the above numbers may be slightly higher or
>>> lower for any actual workload.
>>
>> Nahh, shouldn't it read "numbers may be _significant_ higher or lower"
>> ...? ;-)
>>
>> Regards,
>> jel.
>
> Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't going to be
> able to do more than 200 under ideal conditions, and should be able to
> manage 50 under anything other than the pedantically worst-case situation.
> That's only about a 50% deviation, not like an order of magnitude or so.

So is there a way to read these real I/Ops numbers ?

iostat is reporting 600-800 I/Ops peak (1 second sample) for these
7200 RPM SATA drives. If the drives are doing aggregation, then how to
tell what is really going on ?

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-02 Thread Erik Trimble

On 6/2/2011 5:12 PM, Jens Elkner wrote:

On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:

On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:



Here's how you calculate (average) how long a random IOPs takes:
seek time + ((60 / RPMs) / 2))]

A truly sequential IOPs is:
(60 / RPMs) / 2)

For that series of drives, seek time averages 8.5ms (per Seagate).
So, you get

1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
IOPS
1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.

Note that due to averaging, the above numbers may be slightly higher or
lower for any actual workload.

Nahh, shouldn't it read "numbers may be _significant_ higher or lower"
...? ;-)

Regards,
jel.


Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't going 
to be able to do more than 200 under ideal conditions, and should be 
able to manage 50 under anything other than the pedantically worst-case 
situation. That's only about a 50% deviation, not like an order of 
magnitude or so.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-02 Thread Jens Elkner
On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:
> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
  
> Here's how you calculate (average) how long a random IOPs takes:
> seek time + ((60 / RPMs) / 2))]
> 
> A truly sequential IOPs is:
> (60 / RPMs) / 2)
> 
> For that series of drives, seek time averages 8.5ms (per Seagate).
> So, you get 
> 
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS
> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
> 
> Note that due to averaging, the above numbers may be slightly higher or
> lower for any actual workload.

Nahh, shouldn't it read "numbers may be _significant_ higher or lower"
...? ;-)

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-02 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Erik Trimble
> 
> Here's how you calculate (average) how long a random IOPs takes:
> 
> seek time + ((60 / RPMs) / 2))]
> 
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS

While this is true, all drives nowadays use things like PIO command
queueing, and other hardware optimization techniques.  So even when you
instruct the drive to do a bunch of random IO, the drive will make it less
random in the controller before it instructs the arm to move about and so
on.  Generally speaking, these techniques will approx double the random
IOPS, because with a random distribution of IO requests, on average it will
be able to halve the randomness.

Consider your nit picked.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-02 Thread Paul Kraus
On Wed, Jun 1, 2011 at 9:17 PM, Erik Trimble  wrote:

> Here's how you calculate (average) how long a random IOPs takes:
>
> seek time + ((60 / RPMs) / 2))]
>
> A truly sequential IOPs is:
>
> (60 / RPMs) / 2)
>
> For that series of drives, seek time averages 8.5ms (per Seagate).
>
> So, you get
>
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS
>
> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.

Thank you. I had found the seek specification, but did not know how to
covert it to anything approaching a useful I/Ops limit.

> Note that due to averaging, the above numbers may be slightly higher or
> lower for any actual workload.

> In your case, since ZFS does write aggregation (turning multiple write
> requests into a single larger one), you might see what appears to be
> more than the above number from something like 'iostat', which is
> measuring not the *actual* writes to physical disk, but the *requested*
> write operations.

Hurmmm, I don't think that really explains what I am seeing. iostat
output for the two drives that are resilvering (yes, we had a second
failure before Oracle could get us a replacement drive, the hoops
first line support is making us hop through is amazing, in a bad way):

iostat -xn c6t5000C5001A452C72d0 c6t5000C5001A406415d0 1
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  359.70.3 1181.1  0.0  1.80.05.1   0  28
c6t5000C5001A406415d0
0.1  573.36.2 1846.8  0.0  3.00.05.2   0  45
c6t5000C5001A452C72d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  629.30.0 1859.7  0.0  3.00.04.7   0  53
c6t5000C5001A406415d0
0.0  581.10.0 1780.8  0.0  2.80.04.9   0  48
c6t5000C5001A452C72d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  855.00.0 3595.7  0.0  4.90.05.7   0  70
c6t5000C5001A406415d0
0.0  785.90.0 3487.1  0.0  5.20.06.7   0  70
c6t5000C5001A452C72d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  842.30.0 2709.8  0.0  4.20.05.0   0  71
c6t5000C5001A406415d0
0.0  811.30.0 2607.3  0.0  4.10.05.0   0  68
c6t5000C5001A452C72d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  567.00.0 1946.0  0.0  2.80.04.9   0  48
c6t5000C5001A406415d0
0.0  549.00.0 1897.0  0.0  2.70.04.9   0  48
c6t5000C5001A452C72d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  803.80.0 2860.6  0.0  4.70.05.8   0  72
c6t5000C5001A406415d0
0.0  798.80.0 2756.4  0.0  4.30.05.4   0  70
c6t5000C5001A452C72d0

and the zpool configuration:

> zpool status
  pool: zpool-53
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress for 16h29m, 19.17% done, 69h28m to go
config:

NAME STATE READ WRITE CKSUM
alb-ed-01DEGRADED 0 0 0
  raidz2-0   ONLINE   0 0 0
c6t5000C5001A67E217d0ONLINE   0 0 0
c6t5000C5001A67AF9Dd0ONLINE   0 0 0
c6t5000C5001A67AADBd0ONLINE   0 0 0
c6t5000C5001A67A539d0ONLINE   0 0 0
c6t5000C5001A67A099d0ONLINE   0 0 0
c6t5000C5001A679F0Dd0ONLINE   0 0 0
c6t5000C5001A679C5Dd0ONLINE   0 0 0
c6t5000C5001A679B46d0ONLINE   0 0 0
c6t5000C5001A679A09d0ONLINE   0 0 0
c6t5000C5001A67104Ed0ONLINE   0 0 0
c6t5000C5001A670DBEd0ONLINE   0 0 0
c6t5000C5001A66E3DAd0ONLINE   0 0 0
c6t5000C5001A66411Ad0ONLINE   0 0 0
c6t5000C5001A663D19d0ONLINE   0 0 0
c6t5000C5001A663783d0ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c6t5000C5001A663474d0ONLINE   0 0 0
c6t5000C5001A65EF79d0ONLINE   0 0 0
c6t5000C5001A65D7C0d0ONLINE   0 0 0
c6t5000C5001A65D50Ed0ONLINE   0 0 0
c6t5000C5001A65D000d0ONLINE   0 0 0
c6t5000C5001A65BBD8d0ONLINE   0   

Re: [zfs-discuss] SATA disk perf question

2011-06-01 Thread Erik Trimble
On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
> I figure this group will know better than any other I have contact
> with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
> badged Seagate ST31000N in a J4400) ? I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
> There is no other I/O activity on this box, as this is a remote
> replication target for production data. I have a the replication
> disabled until the resilver completes.
> 
> Solaris 10U9
> zpool version 22
> Server is a T2000
> 

Here's how you calculate (average) how long a random IOPs takes:

seek time + ((60 / RPMs) / 2))]


A truly sequential IOPs is:

(60 / RPMs) / 2)


For that series of drives, seek time averages 8.5ms (per Seagate).

So, you get 

1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
IOPS

1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.



Note that due to averaging, the above numbers may be slightly higher or
lower for any actual workload.




In your case, since ZFS does write aggregation (turning multiple write
requests into a single larger one), you might see what appears to be
more than the above number from something like 'iostat', which is
measuring not the *actual* writes to physical disk, but the *requested*
write operations.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-01 Thread Paul Kraus
On Wed, Jun 1, 2011 at 1:16 PM, Tuomas Leikola  wrote:
>> I have a resilver running and am
>> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
>
> IIRC resilver works in block birth order (write order) which is
> commonly more-or-less sequential unless the fs is fragmented. So it
> might or might not be. I think you cannot get that kind of performance
> for a fully random load, more like 100 IOPS or so.

Since this zpool only receives zfs send streams from the far end,
I would expect the data to be relatively sequential (minus the holes
from deleted snapshots).

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-01 Thread Tuomas Leikola
> I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.

IIRC resilver works in block birth order (write order) which is
commonly more-or-less sequential unless the fs is fragmented. So it
might or might not be. I think you cannot get that kind of performance
for a fully random load, more like 100 IOPS or so.

-- 
- Tuomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA disk perf question

2011-06-01 Thread Tomas Ögren
On 01 June, 2011 - Paul Kraus sent me these 0,9K bytes:

> I figure this group will know better than any other I have contact
> with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
> badged Seagate ST31000N in a J4400) ? I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
> There is no other I/O activity on this box, as this is a remote
> replication target for production data. I have a the replication
> disabled until the resilver completes.

700-800 seq ones perhaps.. for random, you can divide by 10.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SATA disk perf question

2011-06-01 Thread Paul Kraus
I figure this group will know better than any other I have contact
with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
badged Seagate ST31000N in a J4400) ? I have a resilver running and am
seeing about 700-800 writes/sec. on the hot spare as it resilvers.
There is no other I/O activity on this box, as this is a remote
replication target for production data. I have a the replication
disabled until the resilver completes.

Solaris 10U9
zpool version 22
Server is a T2000

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss