Re: [zfs-discuss] SATA disk perf question
On Thu, Jun 2 at 20:49, Erik Trimble wrote: Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't going to be able to do more than 200 under ideal conditions, and should be able to manage 50 under anything other than the pedantically worst-case situation. That's only about a 50% deviation, not like an order of magnitude or so. Most cache-enabled 7200RPM drives can do 20K+ sequential IOPS at small block sizes, up close to their peak transfer rate. For random IO, I typically see 80 IOPS for unqueued reads, 120 for queued reads/writes with cache disabled, and maybe 150-200 for cache enabled writes. The above are all full-stroke, so the average seek is 1/3 stroke (unqueued). On a smaller data set where the drive dwarfs the data set, average seek distance is much shorter and the resulting IOPS can be quite a bit higher. --eric -- Eric D. Mudama edmud...@bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On Fri, Jun 3, 2011 at 11:22 AM, Paul Kraus wrote: > So is there a way to read these real I/Ops numbers ? > > iostat is reporting 600-800 I/Ops peak (1 second sample) for these > 7200 RPM SATA drives. If the drives are doing aggregation, then how to > tell what is really going on ? I've always assumed that crazy high IOPS numbers on 7.2k drives means I'm seeing the individual drive caches absorbing those writes. That's the first place those writes will "land" when coming in from the disk controller. As other posters have said, after that the drive may internally reorder and/or aggregate those writes before sending them to the platter. Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On Thu, Jun 2, 2011 at 11:49 PM, Erik Trimble wrote: > On 6/2/2011 5:12 PM, Jens Elkner wrote: >> >> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote: >>> >>> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: >> >>> Here's how you calculate (average) how long a random IOPs takes: >>> seek time + ((60 / RPMs) / 2))] >>> >>> A truly sequential IOPs is: >>> (60 / RPMs) / 2) >>> >>> For that series of drives, seek time averages 8.5ms (per Seagate). >>> So, you get >>> >>> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 >>> IOPS >>> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. >>> >>> Note that due to averaging, the above numbers may be slightly higher or >>> lower for any actual workload. >> >> Nahh, shouldn't it read "numbers may be _significant_ higher or lower" >> ...? ;-) >> >> Regards, >> jel. > > Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't going to be > able to do more than 200 under ideal conditions, and should be able to > manage 50 under anything other than the pedantically worst-case situation. > That's only about a 50% deviation, not like an order of magnitude or so. So is there a way to read these real I/Ops numbers ? iostat is reporting 600-800 I/Ops peak (1 second sample) for these 7200 RPM SATA drives. If the drives are doing aggregation, then how to tell what is really going on ? -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On 6/2/2011 5:12 PM, Jens Elkner wrote: On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote: On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: Here's how you calculate (average) how long a random IOPs takes: seek time + ((60 / RPMs) / 2))] A truly sequential IOPs is: (60 / RPMs) / 2) For that series of drives, seek time averages 8.5ms (per Seagate). So, you get 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 IOPS 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. Note that due to averaging, the above numbers may be slightly higher or lower for any actual workload. Nahh, shouldn't it read "numbers may be _significant_ higher or lower" ...? ;-) Regards, jel. Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn't going to be able to do more than 200 under ideal conditions, and should be able to manage 50 under anything other than the pedantically worst-case situation. That's only about a 50% deviation, not like an order of magnitude or so. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote: > On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: > Here's how you calculate (average) how long a random IOPs takes: > seek time + ((60 / RPMs) / 2))] > > A truly sequential IOPs is: > (60 / RPMs) / 2) > > For that series of drives, seek time averages 8.5ms (per Seagate). > So, you get > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPS > 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. > > Note that due to averaging, the above numbers may be slightly higher or > lower for any actual workload. Nahh, shouldn't it read "numbers may be _significant_ higher or lower" ...? ;-) Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Erik Trimble > > Here's how you calculate (average) how long a random IOPs takes: > > seek time + ((60 / RPMs) / 2))] > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPS While this is true, all drives nowadays use things like PIO command queueing, and other hardware optimization techniques. So even when you instruct the drive to do a bunch of random IO, the drive will make it less random in the controller before it instructs the arm to move about and so on. Generally speaking, these techniques will approx double the random IOPS, because with a random distribution of IO requests, on average it will be able to halve the randomness. Consider your nit picked. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On Wed, Jun 1, 2011 at 9:17 PM, Erik Trimble wrote: > Here's how you calculate (average) how long a random IOPs takes: > > seek time + ((60 / RPMs) / 2))] > > A truly sequential IOPs is: > > (60 / RPMs) / 2) > > For that series of drives, seek time averages 8.5ms (per Seagate). > > So, you get > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPS > > 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. Thank you. I had found the seek specification, but did not know how to covert it to anything approaching a useful I/Ops limit. > Note that due to averaging, the above numbers may be slightly higher or > lower for any actual workload. > In your case, since ZFS does write aggregation (turning multiple write > requests into a single larger one), you might see what appears to be > more than the above number from something like 'iostat', which is > measuring not the *actual* writes to physical disk, but the *requested* > write operations. Hurmmm, I don't think that really explains what I am seeing. iostat output for the two drives that are resilvering (yes, we had a second failure before Oracle could get us a replacement drive, the hoops first line support is making us hop through is amazing, in a bad way): iostat -xn c6t5000C5001A452C72d0 c6t5000C5001A406415d0 1 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 359.70.3 1181.1 0.0 1.80.05.1 0 28 c6t5000C5001A406415d0 0.1 573.36.2 1846.8 0.0 3.00.05.2 0 45 c6t5000C5001A452C72d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 629.30.0 1859.7 0.0 3.00.04.7 0 53 c6t5000C5001A406415d0 0.0 581.10.0 1780.8 0.0 2.80.04.9 0 48 c6t5000C5001A452C72d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 855.00.0 3595.7 0.0 4.90.05.7 0 70 c6t5000C5001A406415d0 0.0 785.90.0 3487.1 0.0 5.20.06.7 0 70 c6t5000C5001A452C72d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 842.30.0 2709.8 0.0 4.20.05.0 0 71 c6t5000C5001A406415d0 0.0 811.30.0 2607.3 0.0 4.10.05.0 0 68 c6t5000C5001A452C72d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 567.00.0 1946.0 0.0 2.80.04.9 0 48 c6t5000C5001A406415d0 0.0 549.00.0 1897.0 0.0 2.70.04.9 0 48 c6t5000C5001A452C72d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 803.80.0 2860.6 0.0 4.70.05.8 0 72 c6t5000C5001A406415d0 0.0 798.80.0 2756.4 0.0 4.30.05.4 0 70 c6t5000C5001A452C72d0 and the zpool configuration: > zpool status pool: zpool-53 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver in progress for 16h29m, 19.17% done, 69h28m to go config: NAME STATE READ WRITE CKSUM alb-ed-01DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c6t5000C5001A67E217d0ONLINE 0 0 0 c6t5000C5001A67AF9Dd0ONLINE 0 0 0 c6t5000C5001A67AADBd0ONLINE 0 0 0 c6t5000C5001A67A539d0ONLINE 0 0 0 c6t5000C5001A67A099d0ONLINE 0 0 0 c6t5000C5001A679F0Dd0ONLINE 0 0 0 c6t5000C5001A679C5Dd0ONLINE 0 0 0 c6t5000C5001A679B46d0ONLINE 0 0 0 c6t5000C5001A679A09d0ONLINE 0 0 0 c6t5000C5001A67104Ed0ONLINE 0 0 0 c6t5000C5001A670DBEd0ONLINE 0 0 0 c6t5000C5001A66E3DAd0ONLINE 0 0 0 c6t5000C5001A66411Ad0ONLINE 0 0 0 c6t5000C5001A663D19d0ONLINE 0 0 0 c6t5000C5001A663783d0ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c6t5000C5001A663474d0ONLINE 0 0 0 c6t5000C5001A65EF79d0ONLINE 0 0 0 c6t5000C5001A65D7C0d0ONLINE 0 0 0 c6t5000C5001A65D50Ed0ONLINE 0 0 0 c6t5000C5001A65D000d0ONLINE 0 0 0 c6t5000C5001A65BBD8d0ONLINE 0
Re: [zfs-discuss] SATA disk perf question
On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: > I figure this group will know better than any other I have contact > with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun > badged Seagate ST31000N in a J4400) ? I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers. > There is no other I/O activity on this box, as this is a remote > replication target for production data. I have a the replication > disabled until the resilver completes. > > Solaris 10U9 > zpool version 22 > Server is a T2000 > Here's how you calculate (average) how long a random IOPs takes: seek time + ((60 / RPMs) / 2))] A truly sequential IOPs is: (60 / RPMs) / 2) For that series of drives, seek time averages 8.5ms (per Seagate). So, you get 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 IOPS 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. Note that due to averaging, the above numbers may be slightly higher or lower for any actual workload. In your case, since ZFS does write aggregation (turning multiple write requests into a single larger one), you might see what appears to be more than the above number from something like 'iostat', which is measuring not the *actual* writes to physical disk, but the *requested* write operations. -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On Wed, Jun 1, 2011 at 1:16 PM, Tuomas Leikola wrote: >> I have a resilver running and am >> seeing about 700-800 writes/sec. on the hot spare as it resilvers. > > IIRC resilver works in block birth order (write order) which is > commonly more-or-less sequential unless the fs is fragmented. So it > might or might not be. I think you cannot get that kind of performance > for a fully random load, more like 100 IOPS or so. Since this zpool only receives zfs send streams from the far end, I would expect the data to be relatively sequential (minus the holes from deleted snapshots). -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
> I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers. IIRC resilver works in block birth order (write order) which is commonly more-or-less sequential unless the fs is fragmented. So it might or might not be. I think you cannot get that kind of performance for a fully random load, more like 100 IOPS or so. -- - Tuomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA disk perf question
On 01 June, 2011 - Paul Kraus sent me these 0,9K bytes: > I figure this group will know better than any other I have contact > with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun > badged Seagate ST31000N in a J4400) ? I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers. > There is no other I/O activity on this box, as this is a remote > replication target for production data. I have a the replication > disabled until the resilver completes. 700-800 seq ones perhaps.. for random, you can divide by 10. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SATA disk perf question
I figure this group will know better than any other I have contact with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun badged Seagate ST31000N in a J4400) ? I have a resilver running and am seeing about 700-800 writes/sec. on the hot spare as it resilvers. There is no other I/O activity on this box, as this is a remote replication target for production data. I have a the replication disabled until the resilver completes. Solaris 10U9 zpool version 22 Server is a T2000 -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss