Also note that the current prefetching (even in snv_41) still suffers
from some major systematic performance problems.  This should be fixed
by snv_45/s10u3, and is covered by the following bug:

6447377 ZFS prefetch is inconsistant

I'll duplicate Mark's evaluation here, since it doesn't show up on
bugs.opensolaris.org:

---------
The main problem here is that the dmu is not informing the zfetch code
about all IO's.  The zfetch interface is only called when the dmu needs
to go to the ARC to resolve an IO request.  If the dmu finds that the
buffer is already cached (in the dmu) it does not bother to call zfetch.
So here's what can happen:

1 - ARC cache gets loaded up with some portion of a file 'X'
2 - application initiates a sequential read on 'X'
3 - DMU reads first 10 blocks from the file via arc_read()
4 - dmu_zfetch() detects sequential read pattern and starts prefetching
5 - DMU finds blocks 11-15 already cached (does not tell zfetch)
6 - DMU issues read for block 16
7 - dmu_zfetch() sees a gap in the read pattern, and so assumes that we
        are doing a *strided read*, and changes its prefetch algorithm:
        prefetch 10, skip 5, prefetch 10, ... etc.

As the dmu finds other blocks in its cache, the zfetch algorithms can
become even more confused.
---------

With some additional fixes from Jeff, sequential read performance has
been vastly improved.  These fixes are undergoing final testing as we
speak.

- Eric

On Tue, Jul 18, 2006 at 09:45:09AM -0700, Luke Lonergan wrote:
> The prefetch and I/O scheduling of nv41 were responsible for some quirky
> performance.  First time read performance might be good, then subsequent
> reads might be very poor.
> 
> With a very recent update to the zfs module that improves I/O scheduling and
> prefetching, I get the following bonnie++ 1.03a results with a 36 drive
> RAID10, Solaris 10 U2 on an X4500 with 500GB Hitachi drives (zfs
> checksumming is off):
> 
> Version  1.03       ------Sequential Output------    --Sequential Input-
> --Random-
>                     -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 120453  99 467814  98 290391  58 109371  99 993344  94
> 1801   4
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ 30850  99 +++++ +++ +++++
> +++
> 
> Bumping up the number of concurrent processes to 2, we get about 1.5x speed
> reads of RAID10 with a concurrent workload (you have to add the rates
> together): 
> 
> Version  1.03       ------Sequential Output------   --Sequential Input-
> --Random-
>                     -Per Chr- --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 111441  95 212536  54 171798  51 106184  98 719472  88
> 1233   2
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 26085  90 +++++ +++  5700  98 21448  97 +++++ +++  4381
> 97
> 
> Version  1.03       ------Sequential Output------   --Sequential Input-
> --Random-
>                     -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 116355  99 212509  54 171647  50 106112  98 715030  87
> 1274   3
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 26082  99 +++++ +++  5588  98 21399  88 +++++ +++  4272
> 97
> 
> So that?s 2500 seeks per second, 1440MB/s sequential block read, 212MB/s per
> character sequential read.
> 
> - Luke
> 
> 
> On 7/18/06 6:19 AM, "Jonathan Wheeler" <[EMAIL PROTECTED]> wrote:
> 
> 
> >> Version  1.03       ------Sequential Output------
> >> --Sequential Input- --Random-
> >> -Per Chr- --Block-- -Rewrite-
> >> -Per Chr- --Block-- --Seeks--
> >> Machine        Size K/sec %CP K/sec %CP K/sec %CP
> >> K/sec %CP K/sec %CP  /sec %CP
> >> zfs0            16G 88937  99 195973  47 95536  29
> >> 75279  95 228022  27 433.9   1
> >> ------Sequential Create------
> >>  --------Random Create--------
> >> -Create-- --Read--- -Delete--
> >>  -Create-- --Read--- -Delete--
> >> files  /sec %CP  /sec %CP  /sec %CP
> >>   /sec %CP  /sec %CP  /sec %CP
> >> 16 31812  99 +++++ +++ +++++ +++ 28761
> >>   99 +++++ +++ +++++ +++
> >> s0,16G,88937,99,195973,47,95536,29,75279,95,228022,27,
> >> 433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++
> >> ,+++,+++++,+++
> > 
> > Here is my version with 5 disks in a single raidz:
> > 
> >                -------Sequential Output-------- ---Sequential Input--
> > --Random--
> >                -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
> > --Seeks---
> > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> > /sec %CPU
> > 5 disks  16384 62466 72.5 133768 28.0 97698 21.7 66504 88.1 241481 20.7
> > 118.2  1.4
> 

> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to