Re: [zfs-discuss] Periodic flush

2008-07-01 Thread Roch - PAE
Robert Milkowski writes:

  Hello Roch,
  
  Saturday, June 28, 2008, 11:25:17 AM, you wrote:
  
  
  RB I suspect,  a single dd is cpu bound.
  
  I don't think so.
  

We're nearly so as you show. More below.

  Se below one with a stripe of 48x disks again. Single dd with 1024k
  block size and 64GB to write.
  
  bash-3.2# zpool iostat 1
 capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  test 333K  21.7T  1  1   147K   147K
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  1.60K  0   204M
  test 333K  21.7T  0  20.5K  0  2.55G
  test4.00G  21.7T  0  9.19K  0  1.13G
  test4.00G  21.7T  0  0  0  0
  test4.00G  21.7T  0  1.78K  0   228M
  test4.00G  21.7T  0  12.5K  0  1.55G
  test7.99G  21.7T  0  16.2K  0  2.01G
  test7.99G  21.7T  0  0  0  0
  test7.99G  21.7T  0  13.4K  0  1.68G
  test12.0G  21.7T  0  4.31K  0   530M
  test12.0G  21.7T  0  0  0  0
  test12.0G  21.7T  0  6.91K  0   882M
  test12.0G  21.7T  0  21.8K  0  2.72G
  test16.0G  21.7T  0839  0  88.4M
  test16.0G  21.7T  0  0  0  0
  test16.0G  21.7T  0  4.42K  0   565M
  test16.0G  21.7T  0  18.5K  0  2.31G
  test20.0G  21.7T  0  8.87K  0  1.10G
  test20.0G  21.7T  0  0  0  0
  test20.0G  21.7T  0  12.2K  0  1.52G
  test24.0G  21.7T  0  9.28K  0  1.14G
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  14.5K  0  1.81G
  test28.0G  21.7T  0  10.1K  63.6K  1.25G
  test28.0G  21.7T  0  0  0  0
  test28.0G  21.7T  0  10.7K  0  1.34G
  test32.0G  21.7T  0  13.6K  63.2K  1.69G
  test32.0G  21.7T  0  0  0  0
  test32.0G  21.7T  0  0  0  0
  test32.0G  21.7T  0  11.1K  0  1.39G
  test36.0G  21.7T  0  19.9K  0  2.48G
  test36.0G  21.7T  0  0  0  0
  test36.0G  21.7T  0  0  0  0
  test36.0G  21.7T  0  17.7K  0  2.21G
  test40.0G  21.7T  0  5.42K  63.1K   680M
  test40.0G  21.7T  0  0  0  0
  test40.0G  21.7T  0  6.62K  0   844M
  test44.0G  21.7T  1  19.8K   125K  2.46G
  test44.0G  21.7T  0  0  0  0
  test44.0G  21.7T  0  0  0  0
  test44.0G  21.7T  0  18.0K  0  2.24G
  test47.9G  21.7T  1  13.2K   127K  1.63G
  test47.9G  21.7T  0  0  0  0
  test47.9G  21.7T  0  0  0  0
  test47.9G  21.7T  0  15.6K  0  1.94G
  test47.9G  21.7T  1  16.1K   126K  1.99G
  test51.9G  21.7T  0  0  0  0
  test51.9G  21.7T  0  0  0  0
  test51.9G  21.7T  0  14.2K  0  1.77G
  test55.9G  21.7T  0  14.0K  63.2K  1.73G
  test55.9G  21.7T  0  0  0  0
  test55.9G  21.7T  0  0  0  0
  test55.9G  21.7T  0  16.3K  0  2.04G
  test59.9G  21.7T  0  14.5K  63.2K  1.80G
  test59.9G  21.7T  0  0  0  0
  test59.9G  21.7T  0  0  0  0
  test59.9G  21.7T  0  17.7K  0  2.21G
  test63.9G  21.7T  0  4.84K  62.6K   603M
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  ^C
  bash-3.2#
  
  bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
  65536+0 records in
  65536+0 records out
  
  real 1:06.312
  user0.074
  sys54.060
  bash-3.2#
  
  Doesn't look like it's CPU bound.
  

So if sys we're at 81%  of CPU saturation. If you make this
100% you will still have zeros in the zpool iostat.

We 

Re: [zfs-discuss] Periodic flush

2008-06-30 Thread Robert Milkowski
Hello Roch,

Saturday, June 28, 2008, 11:25:17 AM, you wrote:


RB I suspect,  a single dd is cpu bound.

I don't think so.

Se below one with a stripe of 48x disks again. Single dd with 1024k
block size and 64GB to write.

bash-3.2# zpool iostat 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
test 333K  21.7T  1  1   147K   147K
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  0  0  0
test 333K  21.7T  0  1.60K  0   204M
test 333K  21.7T  0  20.5K  0  2.55G
test4.00G  21.7T  0  9.19K  0  1.13G
test4.00G  21.7T  0  0  0  0
test4.00G  21.7T  0  1.78K  0   228M
test4.00G  21.7T  0  12.5K  0  1.55G
test7.99G  21.7T  0  16.2K  0  2.01G
test7.99G  21.7T  0  0  0  0
test7.99G  21.7T  0  13.4K  0  1.68G
test12.0G  21.7T  0  4.31K  0   530M
test12.0G  21.7T  0  0  0  0
test12.0G  21.7T  0  6.91K  0   882M
test12.0G  21.7T  0  21.8K  0  2.72G
test16.0G  21.7T  0839  0  88.4M
test16.0G  21.7T  0  0  0  0
test16.0G  21.7T  0  4.42K  0   565M
test16.0G  21.7T  0  18.5K  0  2.31G
test20.0G  21.7T  0  8.87K  0  1.10G
test20.0G  21.7T  0  0  0  0
test20.0G  21.7T  0  12.2K  0  1.52G
test24.0G  21.7T  0  9.28K  0  1.14G
test24.0G  21.7T  0  0  0  0
test24.0G  21.7T  0  0  0  0
test24.0G  21.7T  0  0  0  0
test24.0G  21.7T  0  14.5K  0  1.81G
test28.0G  21.7T  0  10.1K  63.6K  1.25G
test28.0G  21.7T  0  0  0  0
test28.0G  21.7T  0  10.7K  0  1.34G
test32.0G  21.7T  0  13.6K  63.2K  1.69G
test32.0G  21.7T  0  0  0  0
test32.0G  21.7T  0  0  0  0
test32.0G  21.7T  0  11.1K  0  1.39G
test36.0G  21.7T  0  19.9K  0  2.48G
test36.0G  21.7T  0  0  0  0
test36.0G  21.7T  0  0  0  0
test36.0G  21.7T  0  17.7K  0  2.21G
test40.0G  21.7T  0  5.42K  63.1K   680M
test40.0G  21.7T  0  0  0  0
test40.0G  21.7T  0  6.62K  0   844M
test44.0G  21.7T  1  19.8K   125K  2.46G
test44.0G  21.7T  0  0  0  0
test44.0G  21.7T  0  0  0  0
test44.0G  21.7T  0  18.0K  0  2.24G
test47.9G  21.7T  1  13.2K   127K  1.63G
test47.9G  21.7T  0  0  0  0
test47.9G  21.7T  0  0  0  0
test47.9G  21.7T  0  15.6K  0  1.94G
test47.9G  21.7T  1  16.1K   126K  1.99G
test51.9G  21.7T  0  0  0  0
test51.9G  21.7T  0  0  0  0
test51.9G  21.7T  0  14.2K  0  1.77G
test55.9G  21.7T  0  14.0K  63.2K  1.73G
test55.9G  21.7T  0  0  0  0
test55.9G  21.7T  0  0  0  0
test55.9G  21.7T  0  16.3K  0  2.04G
test59.9G  21.7T  0  14.5K  63.2K  1.80G
test59.9G  21.7T  0  0  0  0
test59.9G  21.7T  0  0  0  0
test59.9G  21.7T  0  17.7K  0  2.21G
test63.9G  21.7T  0  4.84K  62.6K   603M
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
test63.9G  21.7T  0  0  0  0
^C
bash-3.2#

bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
65536+0 records in
65536+0 records out

real 1:06.312
user0.074
sys54.060
bash-3.2#

Doesn't look like it's CPU bound.



Let's try to read the file after zpool export test; zpool import test

bash-3.2# zpool iostat 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
test64.0G  21.7T 15 46  1.22M  1.76M
test64.0G  21.7T  0  0  0  0
test64.0G  21.7T  0 

Re: [zfs-discuss] Periodic flush

2008-06-30 Thread Robert Milkowski
Hello Robert,

Tuesday, July 1, 2008, 12:01:03 AM, you wrote:

RM Nevertheless the main issu is jumpy writing...


I was just wondering how much thruoughput I can get running multiple
dd - one per disk drive and what kind of aggregated throughput I would
get.

So for each out of 48 disks I did:

dd if=/dev/zero of=/dev/rdsk/c6t7d0s0 bs=128k

The iostat looks like:

bash-3.2# iostat -xnzC 1|egrep  c[0-6]$|devic
[skipped the first output]
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0 5308.00.0 679418.9  0.1  7.20.01.4   0 718 c1
0.0 5264.20.0 673813.1  0.1  7.20.01.4   0 720 c2
0.0 4047.60.0 518095.1  0.1  7.30.01.8   0 725 c3
0.0 5340.10.0 683532.5  0.1  7.20.01.3   0 718 c4
0.0 5325.10.0 681608.0  0.1  7.10.01.3   0 714 c5
0.0 4089.30.0 523434.0  0.1  7.30.01.8   0 727 c6
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0 5283.10.0 676231.2  0.1  7.20.01.4   0 723 c1
0.0 5215.20.0 667549.5  0.1  7.20.01.4   0 720 c2
0.0 4009.00.0 513152.8  0.1  7.30.01.8   0 725 c3
0.0 5281.90.0 676082.5  0.1  7.20.01.4   0 722 c4
0.0 5316.60.0 680520.9  0.1  7.20.01.4   0 720 c5
0.0 4159.50.0 532420.9  0.1  7.30.01.7   0 726 c6
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0 5322.00.0 681213.6  0.1  7.20.01.4   0 720 c1
0.0 5292.90.0 677494.0  0.1  7.20.01.4   0 722 c2
0.0 4051.40.0 518573.3  0.1  7.30.01.8   0 727 c3
0.0 5315.00.0 680318.8  0.1  7.20.01.4   0 721 c4
0.0 5313.10.0 680074.3  0.1  7.20.01.4   0 723 c5
0.0 4184.80.0 535648.7  0.1  7.30.01.7   0 730 c6
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0 5296.40.0 677940.2  0.1  7.10.01.3   0 714 c1
0.0 5236.40.0 670265.3  0.1  7.20.01.4   0 720 c2
0.0 4023.50.0 515011.5  0.1  7.30.01.8   0 728 c3
0.0 5291.40.0 677300.7  0.1  7.20.01.4   0 723 c4
0.0 5297.40.0 678072.8  0.1  7.20.01.4   0 720 c5
0.0 4095.60.0 524236.0  0.1  7.30.01.8   0 726 c6
^C


one full output:
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0 5302.00.0 678658.6  0.1  7.20.01.4   0 722 c1
0.0  664.00.0 84992.8  0.0  0.90.01.4   1  90 c1t0d0
0.0  657.00.0 84090.5  0.0  0.90.01.3   1  89 c1t1d0
0.0  666.00.0 85251.4  0.0  0.90.01.3   1  89 c1t2d0
0.0  662.00.0 84735.6  0.0  0.90.01.4   1  91 c1t3d0
0.0  669.10.0 85638.4  0.0  0.90.01.4   1  92 c1t4d0
0.0  665.00.0 85122.9  0.0  0.90.01.4   1  91 c1t5d0
0.0  652.90.0 83575.1  0.0  0.90.01.4   1  90 c1t6d0
0.0  666.00.0 85251.8  0.0  0.90.01.4   1  91 c1t7d0
0.0 5293.30.0 677537.5  0.1  7.30.01.4   0 725 c2
0.0  660.00.0 84481.2  0.0  0.90.01.4   1  91 c2t0d0
0.0  661.00.0 84610.3  0.0  0.90.01.4   1  90 c2t1d0
0.0  664.00.0 84997.4  0.0  0.90.01.4   1  90 c2t2d0
0.0  662.00.0 84739.4  0.0  0.90.01.4   1  92 c2t3d0
0.0  655.00.0 83836.6  0.0  0.90.01.4   1  89 c2t4d0
0.0  663.10.0 84871.3  0.0  0.90.01.4   1  90 c2t5d0
0.0  663.10.0 84871.5  0.0  0.90.01.4   1  92 c2t6d0
0.0  665.10.0 85129.7  0.0  0.90.01.4   1  92 c2t7d0
0.0 4072.10.0 521228.9  0.1  7.30.01.8   0 728 c3
0.0  506.90.0 64879.3  0.0  0.90.01.8   1  90 c3t0d0
0.0  513.90.0 65782.4  0.0  0.90.01.8   1  92 c3t1d0
0.0  511.90.0 65524.4  0.0  0.90.01.8   1  91 c3t2d0
0.0  505.90.0 64750.5  0.0  0.90.01.8   1  91 c3t3d0
0.0  502.80.0 64363.6  0.0  0.90.01.8   1  90 c3t4d0
0.0  506.90.0 64879.6  0.0  0.90.01.8   1  91 c3t5d0
0.0  513.90.0 65782.6  0.0  0.90.01.8   1  92 c3t6d0
0.0  509.90.0 65266.6  0.0  0.90.01.8   1  91 c3t7d0
0.0 5298.70.0 678232.6  0.1  7.30.01.4   0 725 c4
0.0  664.10.0 85001.4  0.0  0.90.01.4   1  92 c4t0d0
0.0  662.10.0 84743.4  0.0  0.90.01.4   1  90 c4t1d0
0.0  663.10.0 84872.4  0.0  0.90.01.4   1  92 c4t2d0
0.0  664.10.0 85001.4  0.0  0.90.01.3   1  88 c4t3d0
0.0  657.10.0 84105.4  0.0  0.90.01.4   1  91 c4t4d0
0.0  658.10.0 84234.5  0.0  0.90.01.4   1  91 c4t5d0
0.0  669.20.0 85653.4  0.0  0.9

Re: [zfs-discuss] Periodic flush

2008-06-28 Thread Roch Bourbonnais

Le 28 juin 08 à 05:14, Robert Milkowski a écrit :

 Hello Mark,

 Tuesday, April 15, 2008, 8:32:32 PM, you wrote:

 MM The new write throttle code put back into build 87 attempts to
 MM smooth out the process.  We now measure the amount of time it  
 takes
 MM to sync each transaction group, and the amount of data in that  
 group.
 MM We dynamically resize our write throttle to try to keep the sync
 MM time constant (at 5secs) under write load.  We also introduce
 MM fairness delays on writers when we near pipeline capacity: each
 MM write is delayed 1/100sec when we are about to fill up.  This
 MM prevents a single heavy writer from starving out occasional
 MM writers.  So instead of coming to an abrupt halt when the pipeline
 MM fills, we slow down our write pace.  The result should be a  
 constant
 MM even IO load.

 snv_91, 48x 500GB sata drives in one large stripe:

 # zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0  
 c1t6d0 c1t7d0 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0  
 c2t7d0 c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0  
 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t0d0  
 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0  
 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0
 # zfs set atime=off test


 # dd if=/dev/zero of=/test/q1 bs=1024k
 ^C34374+0 records in
 34374+0 records out


 # zpool iostat 1
   capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 [...]
 test58.9M  21.7T  0  1.19K  0  80.8M
 test 862M  21.7T  0  6.67K  0   776M
 test1.52G  21.7T  0  5.50K  0   689M
 test1.52G  21.7T  0  9.28K  0  1.16G
 test2.88G  21.7T  0  1.14K  0   135M
 test2.88G  21.7T  0  1.61K  0   206M
 test2.88G  21.7T  0  18.0K  0  2.24G
 test5.60G  21.7T  0 79  0   264K
 test5.60G  21.7T  0  0  0  0
 test5.60G  21.7T  0  10.9K  0  1.36G
 test9.59G  21.7T  0  7.09K  0   897M
 test9.59G  21.7T  0  0  0  0
 test9.59G  21.7T  0  6.33K  0   807M
 test9.59G  21.7T  0  17.9K  0  2.24G
 test13.6G  21.7T  0  1.96K  0   239M
 test13.6G  21.7T  0  0  0  0
 test13.6G  21.7T  0  11.9K  0  1.49G
 test17.6G  21.7T  0  9.91K  0  1.23G
 test17.6G  21.7T  0  0  0  0
 test17.6G  21.7T  0  5.48K  0   700M
 test17.6G  21.7T  0  20.0K  0  2.50G
 test21.6G  21.7T  0  2.03K  0   244M
 test21.6G  21.7T  0  0  0  0
 test21.6G  21.7T  0  0  0  0
 test21.6G  21.7T  0  4.03K  0   513M
 test21.6G  21.7T  0  23.7K  0  2.97G
 test25.6G  21.7T  0  1.83K  0   225M
 test25.6G  21.7T  0  0  0  0
 test25.6G  21.7T  0  13.9K  0  1.74G
 test29.6G  21.7T  1  1.40K   127K   167M
 test29.6G  21.7T  0  0  0  0
 test29.6G  21.7T  0  7.14K  0   912M
 test29.6G  21.7T  0  19.2K  0  2.40G
 test33.6G  21.7T  1378   127K  34.8M
 test33.6G  21.7T  0  0  0  0
 ^C


 Well, doesn't actually look good. Checking with iostat I don't see any
 problems like long service times, etc.


I suspect,  a single dd is cpu bound.

 Reducing zfs_txg_synctime to 1 helps a little bit but still it's not
 even stream of data.

 If I start 3 dd streams at the same time then it is slightly better
 (zfs_txg_synctime set back to 5) but still very jumpy.


Try zfs_txg_synctime to 10; that reduces the txg overhead.

 Reading with one dd produces steady throghput but I'm disapointed with
 actual performance:


Again, probably cpu bound. What's ptime dd... saying ?

 test 161G  21.6T  9.94K  0  1.24G  0
 test 161G  21.6T  10.0K  0  1.25G  0
 test 161G  21.6T  10.3K  0  1.29G  0
 test 161G  21.6T  10.1K  0  1.27G  0
 test 161G  21.6T  10.4K  0  1.31G  0
 test 161G  21.6T  10.1K  0  1.27G  0
 test 161G  21.6T  10.4K  0  1.30G  0
 test 161G  21.6T  10.2K  0  1.27G  0
 test 161G  21.6T  10.3K  0  1.29G  0
 test 161G  21.6T  10.0K  0  1.25G  0
 test 161G  21.6T  9.96K  0  1.24G  0
 test 161G  21.6T  10.6K  0  1.33G  0
 test 161G  21.6T  10.1K  0  1.26G  0
 test 161G  21.6T  10.2K  0  1.27G  0
 test 161G  21.6T  10.4K  0  1.30G  0
 test 161G  21.6T  9.62K  0  1.20G  0
 test 161G  21.6T  8.22K  0  1.03G  0
 test 161G  21.6T  9.61K  0  

Re: [zfs-discuss] Periodic flush

2008-06-27 Thread Robert Milkowski
Hello Mark,

Tuesday, April 15, 2008, 8:32:32 PM, you wrote:

MM The new write throttle code put back into build 87 attempts to
MM smooth out the process.  We now measure the amount of time it takes
MM to sync each transaction group, and the amount of data in that group.
MM We dynamically resize our write throttle to try to keep the sync
MM time constant (at 5secs) under write load.  We also introduce
MM fairness delays on writers when we near pipeline capacity: each
MM write is delayed 1/100sec when we are about to fill up.  This
MM prevents a single heavy writer from starving out occasional
MM writers.  So instead of coming to an abrupt halt when the pipeline
MM fills, we slow down our write pace.  The result should be a constant
MM even IO load.

snv_91, 48x 500GB sata drives in one large stripe:

# zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 
c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 c3t0d0 c3t1d0 c3t2d0 
c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 
c4t6d0 c4t7d0 c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 
c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0
# zfs set atime=off test


# dd if=/dev/zero of=/test/q1 bs=1024k
^C34374+0 records in
34374+0 records out


# zpool iostat 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
[...]
test58.9M  21.7T  0  1.19K  0  80.8M
test 862M  21.7T  0  6.67K  0   776M
test1.52G  21.7T  0  5.50K  0   689M
test1.52G  21.7T  0  9.28K  0  1.16G
test2.88G  21.7T  0  1.14K  0   135M
test2.88G  21.7T  0  1.61K  0   206M
test2.88G  21.7T  0  18.0K  0  2.24G
test5.60G  21.7T  0 79  0   264K
test5.60G  21.7T  0  0  0  0
test5.60G  21.7T  0  10.9K  0  1.36G
test9.59G  21.7T  0  7.09K  0   897M
test9.59G  21.7T  0  0  0  0
test9.59G  21.7T  0  6.33K  0   807M
test9.59G  21.7T  0  17.9K  0  2.24G
test13.6G  21.7T  0  1.96K  0   239M
test13.6G  21.7T  0  0  0  0
test13.6G  21.7T  0  11.9K  0  1.49G
test17.6G  21.7T  0  9.91K  0  1.23G
test17.6G  21.7T  0  0  0  0
test17.6G  21.7T  0  5.48K  0   700M
test17.6G  21.7T  0  20.0K  0  2.50G
test21.6G  21.7T  0  2.03K  0   244M
test21.6G  21.7T  0  0  0  0
test21.6G  21.7T  0  0  0  0
test21.6G  21.7T  0  4.03K  0   513M
test21.6G  21.7T  0  23.7K  0  2.97G
test25.6G  21.7T  0  1.83K  0   225M
test25.6G  21.7T  0  0  0  0
test25.6G  21.7T  0  13.9K  0  1.74G
test29.6G  21.7T  1  1.40K   127K   167M
test29.6G  21.7T  0  0  0  0
test29.6G  21.7T  0  7.14K  0   912M
test29.6G  21.7T  0  19.2K  0  2.40G
test33.6G  21.7T  1378   127K  34.8M
test33.6G  21.7T  0  0  0  0
^C


Well, doesn't actually look good. Checking with iostat I don't see any
problems like long service times, etc.

Reducing zfs_txg_synctime to 1 helps a little bit but still it's not
even stream of data.

If I start 3 dd streams at the same time then it is slightly better
(zfs_txg_synctime set back to 5) but still very jumpy.

Reading with one dd produces steady throghput but I'm disapointed with
actual performance:

test 161G  21.6T  9.94K  0  1.24G  0
test 161G  21.6T  10.0K  0  1.25G  0
test 161G  21.6T  10.3K  0  1.29G  0
test 161G  21.6T  10.1K  0  1.27G  0
test 161G  21.6T  10.4K  0  1.31G  0
test 161G  21.6T  10.1K  0  1.27G  0
test 161G  21.6T  10.4K  0  1.30G  0
test 161G  21.6T  10.2K  0  1.27G  0
test 161G  21.6T  10.3K  0  1.29G  0
test 161G  21.6T  10.0K  0  1.25G  0
test 161G  21.6T  9.96K  0  1.24G  0
test 161G  21.6T  10.6K  0  1.33G  0
test 161G  21.6T  10.1K  0  1.26G  0
test 161G  21.6T  10.2K  0  1.27G  0
test 161G  21.6T  10.4K  0  1.30G  0
test 161G  21.6T  9.62K  0  1.20G  0
test 161G  21.6T  8.22K  0  1.03G  0
test 161G  21.6T  9.61K  0  1.20G  0
test 161G  21.6T  10.2K  0  1.28G  0
test 161G  21.6T  9.12K  0  1.14G  0
test 161G  21.6T  9.96K  0  1.25G  0
test 161G  21.6T  9.72K  0  1.22G  0
test 161G  21.6T  10.6K  0  1.32G  0
test 161G  21.6T  9.93K  

Re: [zfs-discuss] Periodic flush

2008-05-15 Thread Roch - PAE
Bob Friesenhahn writes:
  On Tue, 15 Apr 2008, Mark Maybee wrote:
   going to take 12sec to get this data onto the disk.  This impedance
   mis-match is going to manifest as pauses:  the application fills
   the pipe, then waits for the pipe to empty, then starts writing again.
   Note that this won't be smooth, since we need to complete an entire
   sync phase before allowing things to progress.  So you can end up
   with IO gaps.  This is probably what the original submitter is
  
  Yes.  With an application which also needs to make best use of 
  available CPU, these I/O gaps cut into available CPU time (by 
  blocking the process) unless the application uses multithreading and 
  an intermediate write queue (more memory) to separate the CPU-centric 
  parts from the I/O-centric parts.  While the single-threaded 
  application is waiting for data to be written, it is not able to read 
  and process more data.  Since reads take time to complete, being 
  blocked on write stops new reads from being started so the data is 
  ready when it is needed.
  
   There is one down side to this new model: if a write load is very
   bursty, e.g., a large 5GB write followed by 30secs of idle, the
   new code may be less efficient than the old.  In the old code, all
  
  This is also a common scenario. :-)
  
  Presumably the special slow I/O code would not kick in unless the 
  burst was large enough to fill quite a bit of the ARC.
  

Bursts of 1/8th of physical memory or 5 seconds of storage
throughput whichever is smallest.

-r



  Real time throttling is quite a challenge to do in software.
  
  Bob
  ==
  Bob Friesenhahn
  [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
  GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-04-18 Thread Robert Milkowski
Hello Mark,

Tuesday, April 15, 2008, 8:32:32 PM, you wrote:

MM ZFS has always done a certain amount of write throttling.  In the past
MM (or the present, for those of you running S10 or pre build 87 bits) this
MM throttling was controlled by a timer and the size of the ARC: we would
MM cut a transaction group every 5 seconds based off of our timer, and
MM we would also cut a transaction group if we had more than 1/4 of the
MM ARC size worth of dirty data in the transaction group.  So, for example,
MM if you have a machine with 16GB of physical memory it wouldn't be
MM unusual to see an ARC size of around 12GB.  This means we would allow
MM up to 3GB of dirty data into a single transaction group (if the writes
MM complete in less than 5 seconds).  Now we can have up to three
MM transaction groups in progress at any time: open context, quiesce
MM context, and sync context.  As a final wrinkle, we also don't allow more
MM than 1/2 the ARC to be composed of dirty write data.  All taken
MM together, this means that there can be up to 6GB of writes in the pipe
MM (using the 12GB ARC example from above).

MM Problems with this design start to show up when the write-to-disk
MM bandwidth can't keep up with the application: if the application is
MM writing at a rate of, say, 1GB/sec, it will fill the pipe within
MM 6 seconds.  But if the IO bandwidth to disk is only 512MB/sec, its
MM going to take 12sec to get this data onto the disk.  This impedance
MM mis-match is going to manifest as pauses:  the application fills
MM the pipe, then waits for the pipe to empty, then starts writing again.
MM Note that this won't be smooth, since we need to complete an entire
MM sync phase before allowing things to progress.  So you can end up
MM with IO gaps.  This is probably what the original submitter is
MM experiencing.  Note there are a few other subtleties here that I
MM have glossed over, but the general picture is accurate.

MM The new write throttle code put back into build 87 attempts to
MM smooth out the process.  We now measure the amount of time it takes
MM to sync each transaction group, and the amount of data in that group.
MM We dynamically resize our write throttle to try to keep the sync
MM time constant (at 5secs) under write load.  We also introduce
MM fairness delays on writers when we near pipeline capacity: each
MM write is delayed 1/100sec when we are about to fill up.  This
MM prevents a single heavy writer from starving out occasional
MM writers.  So instead of coming to an abrupt halt when the pipeline
MM fills, we slow down our write pace.  The result should be a constant
MM even IO load.

MM There is one down side to this new model: if a write load is very
MM bursty, e.g., a large 5GB write followed by 30secs of idle, the
MM new code may be less efficient than the old.  In the old code, all
MM of this IO would be let in at memory speed and then more slowly make
MM its way out to disk.  In the new code, the writes may be slowed down.
MM The data makes its way to the disk in the same amount of time, but
MM the application takes longer.  Conceptually: we are sizing the write
MM buffer to the pool bandwidth, rather than to the memory size.



First - thank you for your explanation - it is very helpful.

I'm worried about the last part - but it's hard to be optimal for all
workloads. Nevertheless sometimes the problem is if you change the
behavior from application perspective. With other file systems I
guess you are able to fill in most of memory and still keep disks busy
100% of the time without IO gaps.

My biggest concern were these gaps in IO as zfs should keep disks 100%
busy if needed.



-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-04-15 Thread Mark Maybee
ZFS has always done a certain amount of write throttling.  In the past
(or the present, for those of you running S10 or pre build 87 bits) this
throttling was controlled by a timer and the size of the ARC: we would
cut a transaction group every 5 seconds based off of our timer, and
we would also cut a transaction group if we had more than 1/4 of the
ARC size worth of dirty data in the transaction group.  So, for example,
if you have a machine with 16GB of physical memory it wouldn't be
unusual to see an ARC size of around 12GB.  This means we would allow
up to 3GB of dirty data into a single transaction group (if the writes
complete in less than 5 seconds).  Now we can have up to three
transaction groups in progress at any time: open context, quiesce
context, and sync context.  As a final wrinkle, we also don't allow more
than 1/2 the ARC to be composed of dirty write data.  All taken
together, this means that there can be up to 6GB of writes in the pipe
(using the 12GB ARC example from above).

Problems with this design start to show up when the write-to-disk
bandwidth can't keep up with the application: if the application is
writing at a rate of, say, 1GB/sec, it will fill the pipe within
6 seconds.  But if the IO bandwidth to disk is only 512MB/sec, its
going to take 12sec to get this data onto the disk.  This impedance
mis-match is going to manifest as pauses:  the application fills
the pipe, then waits for the pipe to empty, then starts writing again.
Note that this won't be smooth, since we need to complete an entire
sync phase before allowing things to progress.  So you can end up
with IO gaps.  This is probably what the original submitter is
experiencing.  Note there are a few other subtleties here that I
have glossed over, but the general picture is accurate.

The new write throttle code put back into build 87 attempts to
smooth out the process.  We now measure the amount of time it takes
to sync each transaction group, and the amount of data in that group.
We dynamically resize our write throttle to try to keep the sync
time constant (at 5secs) under write load.  We also introduce
fairness delays on writers when we near pipeline capacity: each
write is delayed 1/100sec when we are about to fill up.  This
prevents a single heavy writer from starving out occasional
writers.  So instead of coming to an abrupt halt when the pipeline
fills, we slow down our write pace.  The result should be a constant
even IO load.

There is one down side to this new model: if a write load is very
bursty, e.g., a large 5GB write followed by 30secs of idle, the
new code may be less efficient than the old.  In the old code, all
of this IO would be let in at memory speed and then more slowly make
its way out to disk.  In the new code, the writes may be slowed down.
The data makes its way to the disk in the same amount of time, but
the application takes longer.  Conceptually: we are sizing the write
buffer to the pool bandwidth, rather than to the memory size.

Robert Milkowski wrote:
 Hello eric,
 
 Thursday, March 27, 2008, 9:36:42 PM, you wrote:
 
 ek On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:
 On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:
 This causes the sync to happen much faster, but as you say,  
 suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.
 I hope that this feature is implemented soon, and works well. :-)
 
 ek Actually, this has gone back into snv_87 (and no we don't know which  
 ek s10uX it will go into yet).
 
 
 Could you share more details how it works right now after change?
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Mark Maybee wrote:
 going to take 12sec to get this data onto the disk.  This impedance
 mis-match is going to manifest as pauses:  the application fills
 the pipe, then waits for the pipe to empty, then starts writing again.
 Note that this won't be smooth, since we need to complete an entire
 sync phase before allowing things to progress.  So you can end up
 with IO gaps.  This is probably what the original submitter is

Yes.  With an application which also needs to make best use of 
available CPU, these I/O gaps cut into available CPU time (by 
blocking the process) unless the application uses multithreading and 
an intermediate write queue (more memory) to separate the CPU-centric 
parts from the I/O-centric parts.  While the single-threaded 
application is waiting for data to be written, it is not able to read 
and process more data.  Since reads take time to complete, being 
blocked on write stops new reads from being started so the data is 
ready when it is needed.

 There is one down side to this new model: if a write load is very
 bursty, e.g., a large 5GB write followed by 30secs of idle, the
 new code may be less efficient than the old.  In the old code, all

This is also a common scenario. :-)

Presumably the special slow I/O code would not kick in unless the 
burst was large enough to fill quite a bit of the ARC.

Real time throttling is quite a challenge to do in software.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-28 Thread Robert Milkowski
Hello eric,

Thursday, March 27, 2008, 9:36:42 PM, you wrote:

ek On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:
 On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:

 This causes the sync to happen much faster, but as you say,  
 suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.

 I hope that this feature is implemented soon, and works well. :-)

ek Actually, this has gone back into snv_87 (and no we don't know which  
ek s10uX it will go into yet).


Could you share more details how it works right now after change?

-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread Selim Daoud
the question is: does the IO pausing behaviour you noticed penalize
your application?
what are the consequences at the application level?

for instance we have seen application doing some kind of data capture
from external device (video for example) requiring a constant
throughput to disk (data feed), risking otherwise loss of data. in
this case qfs might be a better option (no free though)
if your application is not suffering, then you should be able to live
with this apparent io hangs

s-


On Thu, Mar 27, 2008 at 3:35 AM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 My application processes thousands of files sequentially, reading
  input files, and outputting new files.  I am using Solaris 10U4.
  While running the application in a verbose mode, I see that it runs
  very fast but pauses about every 7 seconds for a second or two.  This
  is while reading 50MB/second and writing 73MB/second (ARC cache miss
  rate of 87%).  The pause does not occur if the application spends more
  time doing real work.  However, it would be nice if the pause went
  away.

  I have tried turning down the ARC size (from 14GB to 10GB) but the
  behavior did not noticeably improve.  The storage device is trained to
  ignore cache flush requests.  According to the Evil Tuning Guide, the
  pause I am seeing is due to a cache flush after the uberblock updates.

  It does not seem like a wise choice to disable ZFS cache flushing
  entirely.  Is there a better way other than adding a small delay into
  my application?

  Bob
  ==
  Bob Friesenhahn
  [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
  GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
--
Blog: http://fakoli.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread Bob Friesenhahn
On Wed, 26 Mar 2008, Neelakanth Nadgir wrote:
 When you experience the pause at the application level,
 do you see an increase in writes to disk? This might the
 regular syncing of the transaction group to disk.

If I use 'zpool iostat' with a one second interval what I see is two 
or three samples with no write I/O at all followed by a huge write of 
100 to 312MB/second.  Writes claimed to be a lower rate are split 
across two sample intervale.

It seems that writes are being cached and then issued all at once. 
This behavior assumes that the file may be written multiple times so a 
delayed write is more efficient.

If I run a script like

while true
do
sync
done

then the write data rate is much more consistent (at about 
66MB/second) and the program does not stall.  Of course this is not 
very efficient.

Are the 'zpool iostat' statistics accurate?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread Richard Elling
Selim Daoud wrote:
 the question is: does the IO pausing behaviour you noticed penalize
 your application?
 what are the consequences at the application level?

 for instance we have seen application doing some kind of data capture
 from external device (video for example) requiring a constant
 throughput to disk (data feed), risking otherwise loss of data. in
 this case qfs might be a better option (no free though)
 if your application is not suffering, then you should be able to live
 with this apparent io hangs

   

I would look at txg_time first... for lots of streaming writes on a machine
with limited memory writes you can smooth out the sawtooth.

QFS is open sourced. http://blogs.sun.com/samqfs
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread Neelakanth Nadgir
Bob Friesenhahn wrote:
 On Wed, 26 Mar 2008, Neelakanth Nadgir wrote:
 When you experience the pause at the application level,
 do you see an increase in writes to disk? This might the
 regular syncing of the transaction group to disk.
 
 If I use 'zpool iostat' with a one second interval what I see is two 
 or three samples with no write I/O at all followed by a huge write of 
 100 to 312MB/second.  Writes claimed to be a lower rate are split 
 across two sample intervale.
 
 It seems that writes are being cached and then issued all at once. 
 This behavior assumes that the file may be written multiple times so a 
 delayed write is more efficient.
 

This does sound like the regular syncing.

 If I run a script like
 
 while true
 do
 sync
 done
 
 then the write data rate is much more consistent (at about 
 66MB/second) and the program does not stall.  Of course this is not 
 very efficient.
 

This causes the sync to happen much faster, but as you say, suboptimal.
Haven't had the time to go through the bug report, but probably
CR 6429205 each zpool needs to monitor its throughput
and throttle heavy writers
will help.

 Are the 'zpool iostat' statistics accurate?
 

Yes. You could also look at regular iostat
and correlate it.
-neel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread Bob Friesenhahn
On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:

 This causes the sync to happen much faster, but as you say, suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.

I hope that this feature is implemented soon, and works well. :-)

I tested with my application outputting to a UFS filesystem on a 
single 15K RPM SAS disk and saw that it writes about 50MB/second and 
without the bursty behavior of ZFS.  When writing to ZFS filesystem on 
a RAID array, zpool I/O stat reports an average (over 10 seconds) 
write rate of 54MB/second.  Given that the throughput is not much 
higher on the RAID array, I assume that the bottleneck is in my 
application.

 Are the 'zpool iostat' statistics accurate?

 Yes. You could also look at regular iostat
 and correlate it.

Iostat shows that my RAID array disks are loafing with only 9MB/second 
writes to each but with 82 writes/second.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread eric kustarz

On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:
 On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:

 This causes the sync to happen much faster, but as you say,  
 suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.

 I hope that this feature is implemented soon, and works well. :-)

Actually, this has gone back into snv_87 (and no we don't know which  
s10uX it will go into yet).

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-27 Thread abs
you may want to try disabling the disk write cache on the single disk.
also for the RAID disable 'host cache flush' if such an option exists.  that 
solved the problem for me.

let me know.


Bob Friesenhahn [EMAIL PROTECTED] wrote: On Thu, 27 Mar 2008, Neelakanth 
Nadgir wrote:

 This causes the sync to happen much faster, but as you say, suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.

I hope that this feature is implemented soon, and works well. :-)

I tested with my application outputting to a UFS filesystem on a 
single 15K RPM SAS disk and saw that it writes about 50MB/second and 
without the bursty behavior of ZFS.  When writing to ZFS filesystem on 
a RAID array, zpool I/O stat reports an average (over 10 seconds) 
write rate of 54MB/second.  Given that the throughput is not much 
higher on the RAID array, I assume that the bottleneck is in my 
application.

 Are the 'zpool iostat' statistics accurate?

 Yes. You could also look at regular iostat
 and correlate it.

Iostat shows that my RAID array disks are loafing with only 9MB/second 
writes to each but with 82 writes/second.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


   
-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Periodic flush

2008-03-26 Thread Bob Friesenhahn
My application processes thousands of files sequentially, reading 
input files, and outputting new files.  I am using Solaris 10U4. 
While running the application in a verbose mode, I see that it runs 
very fast but pauses about every 7 seconds for a second or two.  This 
is while reading 50MB/second and writing 73MB/second (ARC cache miss 
rate of 87%).  The pause does not occur if the application spends more 
time doing real work.  However, it would be nice if the pause went 
away.

I have tried turning down the ARC size (from 14GB to 10GB) but the 
behavior did not noticeably improve.  The storage device is trained to 
ignore cache flush requests.  According to the Evil Tuning Guide, the 
pause I am seeing is due to a cache flush after the uberblock updates.

It does not seem like a wise choice to disable ZFS cache flushing 
entirely.  Is there a better way other than adding a small delay into 
my application?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-03-26 Thread Neelakanth Nadgir
Bob Friesenhahn wrote:
 My application processes thousands of files sequentially, reading 
 input files, and outputting new files.  I am using Solaris 10U4. 
 While running the application in a verbose mode, I see that it runs 
 very fast but pauses about every 7 seconds for a second or two. 

When you experience the pause at the application level,
do you see an increase in writes to disk? This might the
regular syncing of the transaction group to disk.
This is normal behavior. The amount of pause is
determined by how much data needs to be synced. You could
of course decrease it by reducing the time between syncs
(either by reducing the ARC and/or decreasing txg_time),
however, I am not sure it will translate to better performance
for you.

hth,
-neel

  This
 is while reading 50MB/second and writing 73MB/second (ARC cache miss 
 rate of 87%).  The pause does not occur if the application spends more 
 time doing real work.  However, it would be nice if the pause went 
 away.
 
 I have tried turning down the ARC size (from 14GB to 10GB) but the 
 behavior did not noticeably improve.  The storage device is trained to 
 ignore cache flush requests.  According to the Evil Tuning Guide, the 
 pause I am seeing is due to a cache flush after the uberblock updates.
 
 It does not seem like a wise choice to disable ZFS cache flushing 
 entirely.  Is there a better way other than adding a small delay into 
 my application?
 
 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss