Also note that the current prefetching (even in snv_41) still suffers from some major systematic performance problems. This should be fixed by snv_45/s10u3, and is covered by the following bug:
6447377 ZFS prefetch is inconsistant I'll duplicate Mark's evaluation here, since it doesn't show up on bugs.opensolaris.org: --------- The main problem here is that the dmu is not informing the zfetch code about all IO's. The zfetch interface is only called when the dmu needs to go to the ARC to resolve an IO request. If the dmu finds that the buffer is already cached (in the dmu) it does not bother to call zfetch. So here's what can happen: 1 - ARC cache gets loaded up with some portion of a file 'X' 2 - application initiates a sequential read on 'X' 3 - DMU reads first 10 blocks from the file via arc_read() 4 - dmu_zfetch() detects sequential read pattern and starts prefetching 5 - DMU finds blocks 11-15 already cached (does not tell zfetch) 6 - DMU issues read for block 16 7 - dmu_zfetch() sees a gap in the read pattern, and so assumes that we are doing a *strided read*, and changes its prefetch algorithm: prefetch 10, skip 5, prefetch 10, ... etc. As the dmu finds other blocks in its cache, the zfetch algorithms can become even more confused. --------- With some additional fixes from Jeff, sequential read performance has been vastly improved. These fixes are undergoing final testing as we speak. - Eric On Tue, Jul 18, 2006 at 09:45:09AM -0700, Luke Lonergan wrote: > The prefetch and I/O scheduling of nv41 were responsible for some quirky > performance. First time read performance might be good, then subsequent > reads might be very poor. > > With a very recent update to the zfs module that improves I/O scheduling and > prefetching, I get the following bonnie++ 1.03a results with a 36 drive > RAID10, Solaris 10 U2 on an X4500 with 500GB Hitachi drives (zfs > checksumming is off): > > Version 1.03 ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > thumperdw-i-1 32G 120453 99 467814 98 290391 58 109371 99 993344 94 > 1801 4 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 +++++ +++ +++++ +++ +++++ +++ 30850 99 +++++ +++ +++++ > +++ > > Bumping up the number of concurrent processes to 2, we get about 1.5x speed > reads of RAID10 with a concurrent workload (you have to add the rates > together): > > Version 1.03 ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > thumperdw-i-1 32G 111441 95 212536 54 171798 51 106184 98 719472 88 > 1233 2 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 26085 90 +++++ +++ 5700 98 21448 97 +++++ +++ 4381 > 97 > > Version 1.03 ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > thumperdw-i-1 32G 116355 99 212509 54 171647 50 106112 98 715030 87 > 1274 3 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 26082 99 +++++ +++ 5588 98 21399 88 +++++ +++ 4272 > 97 > > So that?s 2500 seeks per second, 1440MB/s sequential block read, 212MB/s per > character sequential read. > > - Luke > > > On 7/18/06 6:19 AM, "Jonathan Wheeler" <[EMAIL PROTECTED]> wrote: > > > >> Version 1.03 ------Sequential Output------ > >> --Sequential Input- --Random- > >> -Per Chr- --Block-- -Rewrite- > >> -Per Chr- --Block-- --Seeks-- > >> Machine Size K/sec %CP K/sec %CP K/sec %CP > >> K/sec %CP K/sec %CP /sec %CP > >> zfs0 16G 88937 99 195973 47 95536 29 > >> 75279 95 228022 27 433.9 1 > >> ------Sequential Create------ > >> --------Random Create-------- > >> -Create-- --Read--- -Delete-- > >> -Create-- --Read--- -Delete-- > >> files /sec %CP /sec %CP /sec %CP > >> /sec %CP /sec %CP /sec %CP > >> 16 31812 99 +++++ +++ +++++ +++ 28761 > >> 99 +++++ +++ +++++ +++ > >> s0,16G,88937,99,195973,47,95536,29,75279,95,228022,27, > >> 433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++ > >> ,+++,+++++,+++ > > > > Here is my version with 5 disks in a single raidz: > > > > -------Sequential Output-------- ---Sequential Input-- > > --Random-- > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > > --Seeks--- > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU > > /sec %CPU > > 5 disks 16384 62466 72.5 133768 28.0 97698 21.7 66504 88.1 241481 20.7 > > 118.2 1.4 > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss