Re: [zfs-discuss] ZFS Random Read Performance
Richard, First, thank you for the detailed reply ... (comments in line below) On Tue, Nov 24, 2009 at 6:31 PM, Richard Elling richard.ell...@gmail.com wrote: more below... On Nov 24, 2009, at 9:29 AM, Paul Kraus wrote: On Tue, Nov 24, 2009 at 11:03 AM, Richard Elling richard.ell...@gmail.com wrote: Try disabling prefetch. Just tried it... no change in random read (still 17-18 MB/sec for a single thread), but sequential read performance dropped from about 200 MB/sec. to 100 MB/sec. (as expected). Test case is a 3 GB file accessed in 256 KB records. ARC is set to a max of 1 GB for testing. arcstat.pl shows that the vast majority (95%) of reads are missing the cache. hmmm... more testing needed. The question is whether the low I/O rate is because of zfs itself, or the application? Disabling prefetch will expose the application, because zfs is not creating additional and perhaps unnecessary read I/O. The values reported by iozone are in pretty close agreement with what we are seeing with iostat during the test runs. Compression is off on zfs (the iozone test data compresses very well and yields bogus results). I am looking for a good alternative to iozone for random testing, I did put together a crude script to spawn many dd processes accessing the block device itself, each with a different seek over the range of the disk and saw results much greater than the iozone single threaded random performance. Your data which shows the sequential write, random write, and sequential read driving actv to 35 is because prefetching is enabled for the read. We expect the writes to drive to 35 with a sustained write workload of any flavor. Understood. I tried tuning the queue size to 50 and observed that the actv went to 50 (with very little difference in performance), so returned it to the default of 35. The random read (with cache misses) will stall the application, so it takes a lot of threads (16?) to keep 35 concurrent I/Os in the pipeline without prefetching. The ZFS prefetching algorithm is intelligent so it actually complicates the interpretation of the data. What bothers me is that that iostat is showing the 'disk' device as not being saturated during the random read test. I'll post iostat output that I captured yesterday to http://www.ilk.org/~ppk/Geek/ You can clearly see the various test phases (sequential write, rewrite, sequential read, reread, random read, then random write). You're peaking at 658 256KB random IOPS for the 3511, or ~66 IOPS per drive. Since ZFS will max out at 128KB per I/O, the disks see something more than 66 IOPS each. The IOPS data from iostat would be a better metric to observe than bandwidth. These drives are good for about 80 random IOPS each, so you may be close to disk saturation. The iostat data for IOPS and svc_t will confirm. But ... if I am saturating the 3511 with one thread, then why do I get many times that performance with multiple threads ? The T2000 data (sheet 3) shows pretty consistently around 90 256KB IOPS per drive. Like the 3511 case, this is perhaps 20% less than I would expect, perhaps due to the measurement. I ran the T2000 test to see if 10U8 behaved better and to make sure I wasn't seeing an oddity of the 480 / 3511 case. I wanted to see if the random read bahavior was similar, and it was (in relative terms). Also, the 3511 RAID-5 configuration will perform random reads at around 1/2 IOPS capacity if the partition offset is 34. This was the default long ago. The new default is 256. Our 3511's have been running 421F (latest) for a long time :-) We are religious about keeping all the 3511 FW current and matched. The reason is that with a 34 block offset, you are almost guaranteed that a larger I/O will stride 2 disks. You won't notice this as easily with a single thread, but it will be measurable with more threads. Double check the offset with prtvtoc or format. How do I check offset ... format - verify from one of the partitionsis below: format ver Volume name = ascii name = SUN-StorEdge 3511-421F-517.23GB bytes/sector= 512 sectors = 1084710911 accessible sectors = 1084710878 Part TagFlag First Sector Size Last Sector 0usrwm 256 517.22GB 1084694494 1 unassignedwm 000 2 unassignedwm 000 3 unassignedwm 000 4 unassignedwm 000 5 unassignedwm 000 6 unassignedwm 000 8 reservedwm1084694495 8.00MB 1084710878 format Writes are a completely different matter. ZFS has a tendency to turn random writes into sequential writes, so it is pretty much useless to look at random write
Re: [zfs-discuss] ZFS Random Read Performance
I posted baseline stats at http://www.ilk.org/~ppk/Geek/ baseline test was 1 thread, 3 GiB file, 64KiB to 512 KiB record size 480-3511-baseline.xls is an iozone output file iostat-baseline.txt is the iostat output for the device in use (annotated) I also noted an odd behavior yesterady and have not had a chance to better qualify it. I was testing various combinations of vdev quantities and mirror quantities. As I changed the number of vdevs (stripes) from 1 through 8 (all backed buy paritions on the same logical disk on the 3511) there was no real change in sequential write, random write, or random read performance. Sequential read performance did show a drop from 216 MiB/sec at 1 vdev to 180 MiB/sec. at 8 vdevs. This was about as expected. As I changed the number of mirro components things got interesting. Keep in mind that I only have one 3511 for testing right now, I had to use partitions from two other production 3511's to get three mirror components on different arrays. As expected, as I went from 1 to 2 to 3 mirror components the write performance did not change, but the read performance was interesting... see below: read performance mirrors sequential random 1 174 MiB/sec. 23 MiB/sec. 2 229 MiB/sec. 30 MiB/sec. 3 223 MiB/sec. 125 MiB/sec. What they heck happened here ? 1 to 2 mirrors saw a large increase in sequential read perfromance and from 2 to 3 mirrors show a HUGE increase in random read performance. It feels like the behavior of the zfs code changed between 2 and 3 mirrors for the random read data. Now to investigate further, I tried multiple mirrors components on the same array (my test 3511), not that you would do this in production, but I was curious what would happen. In this case the throughput degraded across the board as I added mirror components, as one would expect. In the random read case the array was delivering less overall performance than it was when it was one part of the earlier test (16 MiB/sec. combined vs. 1/3 of 125 MiB/sec.) See sheet 7 of http://www.ilk.org/~ppk/Geek/throughput-summary.ods for these test results. Sheet 8 is the last test I did last night, using the NRAID logical disk type to try to get the 3511 to pass a disk through to zfs, but get the advantage of the cache on the 3511. I'm not sure what to read into those numbers. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, Lunacon 2010 (http://www.lunacon.org/) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
If you are using (3) 3511's, then won't it be possibly that your 3GB workload will be largely or entirely served out of RAID controller cache? Also, I had a question for your production backups (millions of small files), do you have atime=off set for the filesystems? That might be helpful. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
On Wed, Nov 25, 2009 at 7:54 AM, Paul Kraus pk1...@gmail.com wrote: You're peaking at 658 256KB random IOPS for the 3511, or ~66 IOPS per drive. Since ZFS will max out at 128KB per I/O, the disks see something more than 66 IOPS each. The IOPS data from iostat would be a better metric to observe than bandwidth. These drives are good for about 80 random IOPS each, so you may be close to disk saturation. The iostat data for IOPS and svc_t will confirm. But ... if I am saturating the 3511 with one thread, then why do I get many times that performance with multiple threads ? I'm having troubles making sense of the iostat data (I can't tell how many threads at any given point), but I do see lots of times where asvc_t * reads is in the range 850 ms to 950 ms. That is, this is as fast as a single threaded app with a little bit of think time can issue reads (100 reads * 9 ms svc_t + 100 reads * 1 ms think_time = 1 sec). The %busy shows that 90+% of the time there is an I/O in flight (100 reads * 9ms = 900/1000 = 90%). However, %busy isn't aware of how many I/O's could be in flight simultaneously. When you fire up more threads, you are able to have more I/O's in flight concurrently. I don't believe that the I/O's per drive is really a limiting factor at the single threaded case, as the spec sheet for the 3511 says that it has 1 GB of cache per controller. Your working set is small enough that it is somewhat likely that many of those random reads will be served from cache. A dtrace analysis of just how random the reads are would be interesting. I think that hotspot.d from the DTrace Toolkit would be a good starting place. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
more below... On Nov 25, 2009, at 5:54 AM, Paul Kraus wrote: Richard, First, thank you for the detailed reply ... (comments in line below) On Tue, Nov 24, 2009 at 6:31 PM, Richard Elling richard.ell...@gmail.com wrote: more below... On Nov 24, 2009, at 9:29 AM, Paul Kraus wrote: On Tue, Nov 24, 2009 at 11:03 AM, Richard Elling richard.ell...@gmail.com wrote: Try disabling prefetch. Just tried it... no change in random read (still 17-18 MB/sec for a single thread), but sequential read performance dropped from about 200 MB/sec. to 100 MB/sec. (as expected). Test case is a 3 GB file accessed in 256 KB records. ARC is set to a max of 1 GB for testing. arcstat.pl shows that the vast majority (95%) of reads are missing the cache. hmmm... more testing needed. The question is whether the low I/O rate is because of zfs itself, or the application? Disabling prefetch will expose the application, because zfs is not creating additional and perhaps unnecessary read I/O. The values reported by iozone are in pretty close agreement with what we are seeing with iostat during the test runs. Compression is off on zfs (the iozone test data compresses very well and yields bogus results). I am looking for a good alternative to iozone for random testing, I did put together a crude script to spawn many dd processes accessing the block device itself, each with a different seek over the range of the disk and saw results much greater than the iozone single threaded random performance. filebench is usually bundled in /usr/benchmarks or as a pkg. vdbench is easy to use and very portable, www.vdbench.org Your data which shows the sequential write, random write, and sequential read driving actv to 35 is because prefetching is enabled for the read. We expect the writes to drive to 35 with a sustained write workload of any flavor. Understood. I tried tuning the queue size to 50 and observed that the actv went to 50 (with very little difference in performance), so returned it to the default of 35. Yep, bottleneck is on the back end (physical HDDs). For arrays with lots of HDDs, this queue can be deeper, but the 3500 series is way too small to see this. If SSDs are used on the back end, then you can revisit this. From the data, it does look like the random read tests are converging on the media capabilities of the disks in the array. For the array you can see the read-modify-write penalty of RAID-5 as well as the caching and prefetching of reads. Note: the physical I/Os are 128 KB, regardless of the iozone size setting. This is expected, since 128 KB is the default recordsize limit for ZFS. The random read (with cache misses) will stall the application, so it takes a lot of threads (16?) to keep 35 concurrent I/Os in the pipeline without prefetching. The ZFS prefetching algorithm is intelligent so it actually complicates the interpretation of the data. What bothers me is that that iostat is showing the 'disk' device as not being saturated during the random read test. I'll post iostat output that I captured yesterday to http://www.ilk.org/~ppk/Geek/ You can clearly see the various test phases (sequential write, rewrite, sequential read, reread, random read, then random write). Is this a single thread? Usually this means that you aren't creating enough load. ZFS won't be prefetching (as much) for a random read workload, so iostat will expose client bottlenecks. You're peaking at 658 256KB random IOPS for the 3511, or ~66 IOPS per drive. Since ZFS will max out at 128KB per I/O, the disks see something more than 66 IOPS each. The IOPS data from iostat would be a better metric to observe than bandwidth. These drives are good for about 80 random IOPS each, so you may be close to disk saturation. The iostat data for IOPS and svc_t will confirm. But ... if I am saturating the 3511 with one thread, then why do I get many times that performance with multiple threads ? The T2000 data (sheet 3) shows pretty consistently around 90 256KB IOPS per drive. Like the 3511 case, this is perhaps 20% less than I would expect, perhaps due to the measurement. I ran the T2000 test to see if 10U8 behaved better and to make sure I wasn't seeing an oddity of the 480 / 3511 case. I wanted to see if the random read bahavior was similar, and it was (in relative terms). Also, the 3511 RAID-5 configuration will perform random reads at around 1/2 IOPS capacity if the partition offset is 34. This was the default long ago. The new default is 256. Our 3511's have been running 421F (latest) for a long time :-) We are religious about keeping all the 3511 FW current and matched. The reason is that with a 34 block offset, you are almost guaranteed that a larger I/O will stride 2 disks. You won't notice this as easily with a single thread, but it will be measurable with more threads. Double check the offset with prtvtoc or format. How do I check offset ... format - verify from one of the
Re: [zfs-discuss] ZFS Random Read Performance
more below... On Nov 25, 2009, at 7:10 AM, Paul Kraus wrote: I posted baseline stats at http://www.ilk.org/~ppk/Geek/ baseline test was 1 thread, 3 GiB file, 64KiB to 512 KiB record size 480-3511-baseline.xls is an iozone output file iostat-baseline.txt is the iostat output for the device in use (annotated) I also noted an odd behavior yesterady and have not had a chance to better qualify it. I was testing various combinations of vdev quantities and mirror quantities. As I changed the number of vdevs (stripes) from 1 through 8 (all backed buy paritions on the same logical disk on the 3511) there was no real change in sequential write, random write, or random read performance. Sequential read performance did show a drop from 216 MiB/sec at 1 vdev to 180 MiB/sec. at 8 vdevs. This was about as expected. As I changed the number of mirro components things got interesting. Keep in mind that I only have one 3511 for testing right now, I had to use partitions from two other production 3511's to get three mirror components on different arrays. As expected, as I went from 1 to 2 to 3 mirror components the write performance did not change, but the read performance was interesting... see below: read performance mirrors sequential random 1 174 MiB/sec. 23 MiB/sec. 2 229 MiB/sec. 30 MiB/sec. 3 223 MiB/sec. 125 MiB/sec. What they heck happened here ? 1 to 2 mirrors saw a large increase in sequential read perfromance and from 2 to 3 mirrors show a HUGE increase in random read performance. It feels like the behavior of the zfs code changed between 2 and 3 mirrors for the random read data. I can't explain this. It may require a detailed understanding of the hardware configuration to identify the potential bottleneck. The ZFS mirroring code doesn't care how many mirrors there are, it just goes through the list. If the performance is not symmetrical from all sides of the mirror, then YMMV. Now to investigate further, I tried multiple mirrors components on the same array (my test 3511), not that you would do this in production, but I was curious what would happen. In this case the throughput degraded across the board as I added mirror components, as one would expect. In the random read case the array was delivering less overall performance than it was when it was one part of the earlier test (16 MiB/sec. combined vs. 1/3 of 125 MiB/sec.) See sheet 7 of http://www.ilk.org/~ppk/Geek/throughput-summary.ods for these test results. Sheet 8 is the last test I did last night, using the NRAID logical disk type to try to get the 3511 to pass a disk through to zfs, but get the advantage of the cache on the 3511. I'm not sure what to read into those numbers. I read it as the single array, as configured, with 10+1 RAID-5 can deliver around 130 random read IOPS @ 128 KB. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
Try disabling prefetch. -- richard On Nov 24, 2009, at 6:45 AM, Paul Kraus wrote: I know there have been a bunch of discussion of various ZFS performance issues, but I did not see anything specifically on this. In testing a new configuration of an SE-3511 (SATA) array, I ran into an interesting ZFS performance issue. I do not believe that this is creating a major issue for our end users (but it may), but it is certainly impacting our nightly backups. I am only seeing 10-20 MB/sec per thread for random read throughput using iozone for testing. Here is the full config: SF-V480 --- 4 x 1.2 GHz III+ --- 16 GB memory --- Solaris 10U6 with ZFS patch and IDR for snapshot / resilver bug. SE-3511 --- 12 x 500 GB SATA drives --- 11 disk R5 --- dual 2 Gbps FC host connection I have the ARC size limited to 1 GB so that I can test with a rational data set size. The total amount of data that I am testing with is 3 GB and a 256KB record size. I tested with 1 through 20 threads. With 1 thread I got the following results: sequential write: 112 MB/sec. sequential read: 221 MB/sec. random write: 96 MB/sec. random read: 18 MB/sec. As I scaled the number of threads (and kept the total data size the same) I got the following (throughput is in MB/sec): threads sw sr rw rr 2 105 218 93 34 4 106 219 88 52 8 95 189 69 92 16 71 153 76 128 As the number of threads climbs the first thee values drop once you get above 4 threads (one per CPU), but the fourth (random read) climbs well past 4 threads. It is just about linear through 9 threads and then it starts fluctuating, but continues climbing to at least 20 threads (I did not test past 20). Above 16 threads the random read even exceeds the sequential read values. Looking at iostat output for the LUN I am using for the 1 thread case, for the first three tests (sequential write, sequential read, random write) I see %b at 100 and actv climb to 35 and hang out there. For the random read test I see %b at 5 to 7, actv at less than 1 (usually around 0.5 to 0.6), wsvc_t is essentially 0, and asvc_t runs about 14. As the number of threads increases, the iostat values don't really change for the first three tests (sequential write and read), but they climb for the random read. The array is close to saturated at about 170 MB/sec. random read (18 threads), so I know that the 18 MB/sec. value for one thread is _not_limited by the array. I know the 3511 is not a high performance array, but we needed lots of bulk storage and could not afford better when we bought these 3 years ago. But, it seems to me that there is something wrong with the random read performance of ZFS. To test whether this is an effect of the 3511 I ran some tests on another system we have, as follows: T2000 --- 32 thread 1 GHz --- 32 GB memory --- Solaris 10U8 --- 4 Internal 72 GB SAS drives We have a zpool built of one slice on each of the 4 internal drives configured as a striped mirror layout (2 vdevs each of 2 slices). So I/O is spread over all 4 spindles. I started with 4 threads and 8 GB each (32 GB total to insure I got past the ARC, it is not tuned down on this system). I saw exactly the same ratio of sequential read to random read (the random read performance was 23% of the sequential read performance in both cases). Based on looking at iostat values during the test, I am saturating all four drives with the write operations with just 1 thread. The sequential read is saturating the drives with anything more than 1 thread, and the random read is not saturating the drives until I get to about 6 threads. threads sw sr rw rr 1 100 207 88 30 2 103 370 88 53 4 98 350 90 82 8 101 434 92 95 I confirmed that the problem is not unique to either 10U6 or the IDR, 10U8 has the same behavior. I confirmed that the problem is not unique to a FC attached disk array or the SE-3511 in particular. Then I went back and took another look at my original data (SF-V480/SE-3511) and looked at throughput per thread. For the sequential operations and the random write, the throughput per thread fell pretty far and pretty fast, but the per thread random read numbers fell very slowly. Per thread throughput in MB/sec. threads sw sr rw rr 1 112 221 96 18 2 53 109 46 17 4 26 55 22 13 8 12 24 9 12 16 5 10 5 8 So this makes me think that the random read performance issue is a limitation per thread. Does anyone have any idea why ZFS is not reading as fast as the underlying storage can handle in the case of random reads ? Or am I seeing an artifact of iozone itself ? Is there another benchmark I should be using ? P.S. I posted a OpenOffice.org spreadsheet of my test resulsts here: http://www.ilk.org/~ppk/Geek/throughput-summary.ods -- {1 -2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company (
Re: [zfs-discuss] ZFS Random Read Performance
On Tue, Nov 24, 2009 at 11:03 AM, Richard Elling richard.ell...@gmail.com wrote: Try disabling prefetch. Just tried it... no change in random read (still 17-18 MB/sec for a single thread), but sequential read performance dropped from about 200 MB/sec. to 100 MB/sec. (as expected). Test case is a 3 GB file accessed in 256 KB records. ARC is set to a max of 1 GB for testing. arcstat.pl shows that the vast majority (95%) of reads are missing the cache. The reason I don't think that this ishitting our end users is the cache hit ratio (reported by arc_summary.pl) is 95% on the production system (I am working on our test system and am the only one using it right now, so all the I/O load is iozone). I think my next step (beyond more poking with DTrace) is to try a backup and see what I get for ARC hit ratio ... I expect it to be low, but I may be surprised (then I have to figure out why backups are as slow as they are). We are using NetBackup and it takes about 3 days to do a FULL on a 3.3 TB zfs with about 30 million files. Differential Incrementals take 16-22 hours (and almost no data changes). The production server is an M4000, 4 dual core CPUs, 16 GB memory, and about 25 TB of data overall. A big SAMBA file server. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, Lunacon 2010 (http://www.lunacon.org/) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
On Tue, 24 Nov 2009, Paul Kraus wrote: On Tue, Nov 24, 2009 at 11:03 AM, Richard Elling richard.ell...@gmail.com wrote: Try disabling prefetch. Just tried it... no change in random read (still 17-18 MB/sec for a single thread), but sequential read performance dropped from about 200 MB/sec. to 100 MB/sec. (as expected). Test case is a 3 GB file accessed in 256 KB records. ARC is set to a max of 1 GB for testing. arcstat.pl shows that the vast majority (95%) of reads are missing the cache. You will often see the best random access performance if you access the data using the same record size that zfs uses. For example, if you request data in 256KB records, but zfs is using 128KB records, then zfs needs to access, reconstruct, and concatenate two 128K zfs records before it can return any data to the user. This increases the access latency and decreases opportunity to take advantage of concurrency. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Random Read Performance
more below... On Nov 24, 2009, at 9:29 AM, Paul Kraus wrote: On Tue, Nov 24, 2009 at 11:03 AM, Richard Elling richard.ell...@gmail.com wrote: Try disabling prefetch. Just tried it... no change in random read (still 17-18 MB/sec for a single thread), but sequential read performance dropped from about 200 MB/sec. to 100 MB/sec. (as expected). Test case is a 3 GB file accessed in 256 KB records. ARC is set to a max of 1 GB for testing. arcstat.pl shows that the vast majority (95%) of reads are missing the cache. hmmm... more testing needed. The question is whether the low I/O rate is because of zfs itself, or the application? Disabling prefetch will expose the application, because zfs is not creating additional and perhaps unnecessary read I/O. Your data which shows the sequential write, random write, and sequential read driving actv to 35 is because prefetching is enabled for the read. We expect the writes to drive to 35 with a sustained write workload of any flavor. The random read (with cache misses) will stall the application, so it takes a lot of threads (16?) to keep 35 concurrent I/Os in the pipeline without prefetching. The ZFS prefetching algorithm is intelligent so it actually complicates the interpretation of the data. You're peaking at 658 256KB random IOPS for the 3511, or ~66 IOPS per drive. Since ZFS will max out at 128KB per I/O, the disks see something more than 66 IOPS each. The IOPS data from iostat would be a better metric to observe than bandwidth. These drives are good for about 80 random IOPS each, so you may be close to disk saturation. The iostat data for IOPS and svc_t will confirm. The T2000 data (sheet 3) shows pretty consistently around 90 256KB IOPS per drive. Like the 3511 case, this is perhaps 20% less than I would expect, perhaps due to the measurement. Also, the 3511 RAID-5 configuration will perform random reads at around 1/2 IOPS capacity if the partition offset is 34. This was the default long ago. The new default is 256. The reason is that with a 34 block offset, you are almost guaranteed that a larger I/O will stride 2 disks. You won't notice this as easily with a single thread, but it will be measurable with more threads. Double check the offset with prtvtoc or format. Writes are a completely different matter. ZFS has a tendency to turn random writes into sequential writes, so it is pretty much useless to look at random write data. The sequential writes should easily blow through the cache on the 3511. Squinting my eyes, I would expect the array can do around 70 MB/s writes, or 25 256KB IOPS saturated writes. By contrast, the T2000 JBOD data shows consistent IOPS at the disk level and exposes the track cache affect on the sequential read test. Did I mention that I'm a member of BAARF? www.baarf.com :-) Hint: for performance work with HDDs, pay close attention to IOPS, then convert to bandwidth for the PHB. The reason I don't think that this ishitting our end users is the cache hit ratio (reported by arc_summary.pl) is 95% on the production system (I am working on our test system and am the only one using it right now, so all the I/O load is iozone). I think my next step (beyond more poking with DTrace) is to try a backup and see what I get for ARC hit ratio ... I expect it to be low, but I may be surprised (then I have to figure out why backups are as slow as they are). We are using NetBackup and it takes about 3 days to do a FULL on a 3.3 TB zfs with about 30 million files. Differential Incrementals take 16-22 hours (and almost no data changes). The production server is an M4000, 4 dual core CPUs, 16 GB memory, and about 25 TB of data overall. A big SAMBA file server. b119 has improved stat() performance, which should make a positive improvement of such backups. But eventually you may need to move to a multi-stage backup, depending on your business requirements. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss