sorry to insist, but still no real answer...

On Mon, 16 Jul 2012, Bob Friesenhahn wrote:

On Tue, 17 Jul 2012, Michael Hase wrote:

So only one thing left: mirror should read 2x

I don't think that mirror should necessarily read 2x faster even though the potential is there to do so. Last I heard, zfs did not include a special read scheduler for sequential reads from a mirrored pair. As a result, 50% of the time, a read will be scheduled for a device which already has a read scheduled. If this is indeed true, the typical performance would be 150%. There may be some other scheduling factor (e.g. estimate of busyness) which might still allow zfs to select the right side and do better than that.

If you were to add a second vdev (i.e. stripe) then you should see very close to 200% due to the default round-robin scheduling of the writes.

My expectation would be > 200%, as 4 disks are involved. It may not be the perfect 4x scaling, but imho it should be (and is for a scsi system) more than half of the theoretical throughput. This is solaris or a solaris derivative, not linux ;-)


It is really difficult to measure zfs read performance due to caching effects. One way to do it is to write a large file (containing random data such as returned from /dev/urandom) to a zfs filesystem, unmount the filesystem, remount the filesystem, and then time how long it takes to read the file once. The reason why this works is because remounting the filesystem restarts the filesystem cache.

Ok, did a zpool export/import cycle between the dd read and write test.
This really empties the arc, checked this with arc_summary.pl. the test even uses two processes in parallel (doesn't make a difference). Result is still the same:

dd write:  2x 58 MB/sec  --> perfect, each disk does > 110 MB/sec
dd read:   2x 68 MB/sec  --> imho too slow, about 68 MB/sec per disk

For writes each disk gets 900 128k io requests/sec with asvc_t in the 8-9 msec range. For reads each disk only gets 500 io requests/sec, asvc_t 18-20 msec with the default zfs_vdev_maxpending=10. When reducing zfs_vdev_maxpending the asvc_t drops accordingly, the i/o rate remains at 500/sec per disk, throughput also the same. I think iostat values should be reliable here. These high iops numbers make sense as we work on empty pools so there aren't very high seek times.

All benchmarks (dd, bonnie, will try iozone) lead to the same result: on the sata mirror pair read performance is in the range of a single disk. For the sas disks (only two available for testing) and for the scsi system there is quite good throughput scaling.

Here for comparison a table for 1-4 36gb 15k u320 scsi disks on an old sxde box (nevada b130):

            seq write  factor   seq read  factor
            MB/sec              MB/sec
single        82        1         78       1
mirror        79        1        137       1.75
2x mirror    120        1.5      251       3.2

This is exactly what's imho to be expected from mirrors and striped mirrors. It just doesn't happen for my sata pool. Still have no reference numbers for other sata pools, just one with the 4k/512bytes sector problem which is even slower than mine. It seems the zfs performance people just use sas disks and be done.

Michael


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
old ibm dual opteron intellistation with external hp msa30, 36gb 15k u320 scsi 
disks

####################

  pool: scsi1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        scsi1       ONLINE       0     0     0
          c3t4d0    ONLINE       0     0     0

errors: No known data errors

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfssingle       16G   137  99 82739  20 39453   9   314  99 78251   7 856.9   8
Latency               160ms    4799ms    5292ms   43210us    3274ms    2069ms
Version  1.96       ------Sequential Create------ --------Random Create--------
zfssingle           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  8819  34 +++++ +++ 26318  68 20390  73 +++++ +++ 26846  72
Latency             16413us     108us     231us   12206us      46us     124us
1.96,1.96,zfssingle,1,1342514790,16G,,137,99,82739,20,39453,9,314,99,78251,7,856.9,8,16,,,,,8819,34,+++++,+++,26318,68,20390,73,+++++,+++,26846,72,160ms,4799ms,5292ms,43210us,3274ms,2069ms,16413us,108us,231us,12206us,46us,124us

######################

  pool: scsi1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        scsi1       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0

errors: No known data errors

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfsmirror       16G   110  99 79137  19 50591  12   305  99 137244  13  1065  16
Latency               199ms    4932ms    5101ms   50429us    3885ms    1303ms
Version  1.96       ------Sequential Create------ --------Random Create--------
zfsmirror           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 11337  41 +++++ +++ 26398  66 19797  70 +++++ +++ 26299  68
Latency             14297us     139us     136us   10732us      48us     116us
1.96,1.96,zfsmirror,1,1342515696,16G,,110,99,79137,19,50591,12,305,99,137244,13,1065,16,16,,,,,11337,41,+++++,+++,26398,66,19797,70,+++++,+++,26299,68,199ms,4932ms,5101ms,50429us,3885ms,1303ms,14297us,139us,136us,10732us,48us,116us

########################

  pool: scsi1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        scsi1       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfsraid10       16G   127  99 120319  30 86902  23   300  99 251493  26  1747  
27
Latency               105ms    3078ms    5083ms   43082us    3657ms     360ms
Version  1.96       ------Sequential Create------ --------Random Create--------
zfsraid10           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 12031  46 +++++ +++ 25764  64 21220  75 +++++ +++ 27288  69
Latency             14091us     123us     136us   10823us      49us     117us
1.96,1.96,zfsraid10,1,1342515541,16G,,127,99,120319,30,86902,23,300,99,251493,26,1747,27,16,,,,,12031,46,+++++,+++,25764,64,21220,75,+++++,+++,27288,69,105ms,3078ms,5083ms,43082us,3657ms,360ms,14091us,123us,136us,10823us,49us,117us

####################

dd write
--------

for FILE in bigfile1 bigfile2
do
  time /usr/gnu/bin/dd if=/dev/zero of=$FILE bs=1024k count=8192 &
done

8589934592 bytes (8.6 GB) copied, 108.421 s, 79.2 MB/s
8589934592 bytes (8.6 GB) copied, 112.788 s, 76.2 MB/s

           capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
scsi1       14.1G  53.4G      0  1.11K      0   140M
  mirror    7.03G  26.7G      0    571      0  70.2M
    c1t4d0      -      -      0    571      0  70.2M
    c3t4d0      -      -      0    674      0  82.8M
  mirror    7.02G  26.7G      0    567      0  69.9M
    c1t5d0      -      -      0    567      0  69.9M
    c3t5d0      -      -      0    669      0  82.4M
----------  -----  -----  -----  -----  -----  -----


dd read
-------

for FILE in bigfile1 bigfile2
do
  time /usr/gnu/bin/dd if=$FILE of=/dev/null bs=1024k count=8192 &
done

8589934592 bytes (8.6 GB) copied, 62.2953 s, 138 MB/s
8589934592 bytes (8.6 GB) copied, 62.8319 s, 137 MB/s

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
scsi1       16.0G  51.5G  2.08K      0   261M      0
  mirror    7.99G  25.8G  1.06K      0   133M      0
    c1t4d0      -      -    535      0  66.6M      0
    c3t4d0      -      -    544      0  67.6M      0
  mirror    8.01G  25.7G  1.02K      0   128M      0
    c1t5d0      -      -    518      0  64.3M      0
    c3t5d0      -      -    516      0  64.4M      0
----------  -----  -----  -----  -----  -----  -----
dd write
--------

for FILE in bigfile1 bigfile2
do
  time /usr/gnu/bin/dd if=/dev/zero of=$FILE bs=1024k count=16384 &
done

17179869184 bytes (17 GB) copied, 294.442 s, 58.3 MB/s
17179869184 bytes (17 GB) copied, 294.28 s, 58.4 MB/s

                capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
ptest        40.6G   887G      0   1000      0   113M
  mirror     40.6G   887G      0   1000      0   113M
    c5t9d0       -      -      0    935      0   111M
    c5t10d0      -      -      0    946      0   113M
-----------  -----  -----  -----  -----  -----  -----

                   extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  907.0    0.0  106.3  0.0  7.7    0.0    8.5   1  79 c5t9d0
    0.0  914.0    0.0  107.3  0.0  7.7    0.0    8.5   1  80 c5t10d0


############

zpool export ptest
zpool import ptest

arc_summary.pl

System Memory:
         Physical RAM:  16375 MB
         Free Memory :  2490 MB
         LotsFree:      255 MB

ZFS Tunables (/etc/system):

ARC Size:
         Current Size:             85 MB (arcsize)
         Target Size (Adaptive):   12690 MB (c)
         Min Size (Hard Limit):    1918 MB (zfs_arc_min)
         Max Size (Hard Limit):    15351 MB (zfs_arc_max)

############

dd read
-------

for FILE in bigfile1 bigfile2
do
  time /usr/gnu/bin/dd if=$FILE of=/dev/null bs=1024k count=16384 &
done

17179869184 bytes (17 GB) copied, 253.017 s, 67.9 MB/s
17179869184 bytes (17 GB) copied, 253.567 s, 67.8 MB/s

                capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
ptest        71.1G   857G   1008      0   125M      0
  mirror     71.1G   857G   1008      0   125M      0
    c5t9d0       -      -    517      0  64.2M      0
    c5t10d0      -      -    491      0  61.0M      0
-----------  -----  -----  -----  -----  -----  -----

                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  519.0    1.0   64.0    0.0  0.0 10.0    0.0   19.2   1 100 c5t9d0
  521.5    0.5   64.8    0.0  0.0 10.0    0.0   19.1   1 100 c5t10d0
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to