Re: [zfs-discuss] ZFS write I/O stalls

2009-07-08 Thread John Wythe
 I was all ready to write about my frustrations with
 this problem, but I upgraded to snv_117 last night to
 fix some iscsi bugs and now it seems that the write
 throttling is working as described in that blog.

I may have been a little premature. While everything is much improved for Samba 
and local disk operations (dd, cp) on snv_117, Comstar ISCSI writes still seem 
to incur this write a bit, block, write a bit, block every 5 seconds.

But on top of that, I am getting relatively poor ISCSI performance for some 
reason with a direct gigabit link with MTU=9000. I'm not sure what that is 
about yet.

-John
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-03 Thread Victor Latushkin

On 02.07.09 22:05, Bob Friesenhahn wrote:

On Thu, 2 Jul 2009, Zhu, Lejun wrote:


Actually it seems to be 3/4:


3/4 is an awful lot.  That would be 15 GB on my system, which explains 
why the 5 seconds to write rule is dominant.


3/4 is 1/8 * 6, where 6 is worst-case inflation factor (for raid-z2 is 9 
actually, and considering ganged 1k block on raid-z2 in the really bad 
case it should be even bigger than that). DSL does inflate write sizes 
too, so inflated write sizes are compared against inflated limit, so it 
should be fine.


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-03 Thread Tristan Ball

Is the system otherwise responsive during the zfs sync cycles?

I ask because I think I'm seeing a similar thing - except that it's not 
only other writers that block , it seems like other interrupts are 
blocked. Pinging my zfs server in 1s intervals results in large delays 
while the system syncs, followed by normal response times while the 
system buffers more input...


Thanks,
   Tristan.

Bob Friesenhahn wrote:
It has been quite some time (about a year) since I did testing of 
batch processing with my software (GraphicsMagick).  In between time, 
ZFS added write-throttling.  I am using Solaris 10 with kernel 141415-03.


Quite a while back I complained that ZFS was periodically stalling the 
writing process (which UFS did not do).  The ZFS write-throttling 
feature was supposed to avoid that.  In my testing today I am still 
seeing ZFS stall the writing process periodically.  When the process 
is stalled, there is a burst of disk activity, a burst of context 
switching, and total CPU use drops to almost zero. Zpool iostat says 
that read bandwidth is 15.8M and write bandwidth is 15.8M over a 60 
second averaging interval.  Since my drive array is good for writing 
over 250MB/second, this is a very small write load and the array is 
loafing.


My program uses the simple read-process-write approach.  Each file 
written (about 8MB/file) is written contiguously and written just 
once.  Data is read and written in 128K blocks.  For this application 
there is no value obtained by caching the file just written.  From 
what I am seeing, reading occurs as needed, but writes are being 
batched up until the next ZFS synchronization cycle.  During the ZFS 
synchronization cycle it seems that processes are blocked from 
writing. Since my system has a lot of memory and the ARC is capped at 
10GB, quite a lot of data can be queued up to be written.  The ARC is 
currently running at its limit of 10GB.


If I tell my software to invoke fsync() before closing each written 
file, then the stall goes away, but the program then needs to block so 
there is less beneficial use of the CPU.


If this application stall annoys me, I am sure that it would really 
annoy a user with mission-critical work which needs to get done on a 
uniform basis.


If I run this little script then the application runs more smoothly 
but I see evidence of many shorter stalls:


while true
do
  sleep 3
  sync
done

Is there a solution in the works for this problem?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-02 Thread Bob Friesenhahn

On Thu, 2 Jul 2009, Zhu, Lejun wrote:


Actually it seems to be 3/4:


3/4 is an awful lot.  That would be 15 GB on my system, which explains 
why the 5 seconds to write rule is dominant.


It seems that both rules are worthy of re-consideration.

There is also still the little problem that zfs is incable of reading 
during all/much of the time it is syncing a TXG.  Even if the TXG is 
written more often, readers will still block, resulting in a similar 
cumulative effect on performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-01 Thread Marcelo Leal
 
 Note that this issue does not apply at all to NFS
 service, database 
 service, or any other usage which does synchronous
 writes.
 
 Bob
 Hello Bob,
 There is impact for all workloads.
 The fact that the write is sync or not, is just a question to write on slog 
(SSD) or not.
 But the txg interval and sync time is the same. Actually the zil code is just 
to preserve that exact same thing for synchronous writes.

 Leal
[ http://www.eall.com.br/blog ]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-01 Thread Zhu, Lejun
Actually it seems to be 3/4:

dsl_pool.c
391 zfs_write_limit_max = ptob(physmem)  
zfs_write_limit_shift;
392 zfs_write_limit_inflated = MAX(zfs_write_limit_min,
393 spa_get_asize(dp-dp_spa, zfs_write_limit_max));

While spa_get_asize is:

spa_misc.c
   1249 uint64_t
   1250 spa_get_asize(spa_t *spa, uint64_t lsize)
   1251 {
   1252 /*
   1253  * For now, the worst case is 512-byte RAID-Z blocks, in which
   1254  * case the space requirement is exactly 2x; so just assume 
that.
   1255  * Add to this the fact that we can have up to 3 DVAs per bp, 
and
   1256  * we have to multiply by a total of 6x.
   1257  */
   1258 return (lsize * 6);
   1259 }

Which will result in:
   zfs_write_limit_inflated = MAX((32  20), (ptob(physmem)  3) * 6);

Bob Friesenhahn wrote:
 Even if I set zfs_write_limit_override to 8053063680 I am unable to
 achieve the massive writes that Solaris 10 (141415-03) sends to my
 drive array by default.
 
 When I read the blog entry at
 http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle, I see this
 statement:
 
 The new code keeps track of the amount of data accepted in a TXG and
 the time it takes to sync. It dynamically adjusts that amount so that
 each TXG sync takes about 5 seconds (txg_time variable). It also
 clamps the limit to no more than 1/8th of physical memory.
 
 On my system I see that the about 5 seconds rule is being followed,
 but see no sign of clamping the limit to no more than 1/8th of
 physical memory.  There is no sign of clamping at all.  The writen
 data is captured and does take about 5 seconds to write (good
 estimate).
 
 On my system with 20GB of RAM, and ARC memory limit set to 10GB
 (zfs:zfs_arc_max = 0x28000), the maximum zfs_write_limit_override
 value I can set is on the order of 8053063680, yet this results in a
 much smaller amount of data being written per write cycle than the
 Solaris 10 default operation.  The default operation is 24 seconds of
 no write activity followed by 5 seconds of write.
 
 On my system, 1/8 of memory would be 2.5GB.  If I set the
 zfs_write_limit_override value to 2684354560 then it seems that about
 1.2 seconds of data is captured for write.  In this case I see 5
 seconds of no write followed by maybe a second of write.
 
 This causes me to believe that the algorithm is not implemented as
 described in Solaris 10.
 
 Bob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Ross
 backup windows using primarily iSCSI. When those
 writes occur to my RaidZ volume, all activity pauses until the writes
 are fully flushed.

The more I read about this, the worse it sounds.  The thing is, I can see where 
the ZFS developers are coming from - in theory this is a more efficient use of 
the disk, and with that being the slowest part of the system, there probably is 
a slight benefit in computational time.

However, it completely breaks any process like this that can't afford 3-5s 
delays in processing, it makes ZFS a nightmare for things like audio or video 
editing (where it would otherwise be a perfect fit), and it's also horrible 
from the perspective of the end user.

Does anybody know if a L2ARC would help this?  Does that work off a different 
queue, or would reads still be blocked?

I still think a simple solution to this could be to split the ZFS writes into 
smaller chunks.  That creates room for reads to be squeezed in (with the ratio 
of reads to writes something that should be automatically balanced by the 
software), but you still get the benefit of ZFS write ordering with all the 
work that's gone into perfecting that.  

Regardless of whether there are reads or not, your data is always going to be 
written to disk in an optimized fashion, and you could have a property on the 
pool that specifies how finely chopped up writes should be, allowing this to be 
easily tuned.

We're considering ZFS as storage for our virtualization solution, and this 
could be a big concern.  We really don't want the entire network pausing for 
3-5 seconds any time there is a burst of write activity.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Ross wrote:


However, it completely breaks any process like this that can't 
afford 3-5s delays in processing, it makes ZFS a nightmare for 
things like audio or video editing (where it would otherwise be a 
perfect fit), and it's also horrible from the perspective of the end 
user.


Yes.  I updated the image at 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-stalls.png 
so that it shows the execution impact with more processes running. 
This is taken with three processes running in parallel so that there 
can be no doubt that I/O is being globally blocked and it is not just 
misbehavior of a single process.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
For what it is worth, I too have seen this behavior when load testing our zfs 
box. I used iometer and the RealLife profile (1 worker, 1 target, 65% reads, 
60% random, 8k, 32 IOs in the queue). When writes are being dumped, reads drop 
close to zero, from 600-700 read IOPS to 15-30 read IOPS.

zpool iostat data01 1

Where data01 is my pool name

pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  55.5G  20.4T691  0  4.21M  0
data01  55.5G  20.4T632  0  3.80M  0
data01  55.5G  20.4T657  0  3.93M  0
data01  55.5G  20.4T669  0  4.12M  0
data01  55.5G  20.4T689  0  4.09M  0
data01  55.5G  20.4T488  1.77K  2.94M  9.56M
data01  55.5G  20.4T 29  4.28K   176K  23.5M
data01  55.5G  20.4T 25  4.26K   165K  23.7M
data01  55.5G  20.4T 20  3.97K   133K  22.0M
data01  55.6G  20.4T170  2.26K  1.01M  11.8M
data01  55.6G  20.4T678  0  4.05M  0
data01  55.6G  20.4T625  0  3.74M  0
data01  55.6G  20.4T685  0  4.17M  0
data01  55.6G  20.4T690  0  4.04M  0
data01  55.6G  20.4T679  0  4.02M  0
data01  55.6G  20.4T664  0  4.03M  0
data01  55.6G  20.4T699  0  4.27M  0
data01  55.6G  20.4T423  1.73K  2.66M  9.32M
data01  55.6G  20.4T 26  3.97K   151K  21.8M
data01  55.6G  20.4T 34  4.23K   223K  23.2M
data01  55.6G  20.4T 13  4.37K  87.1K  23.9M
data01  55.6G  20.4T 21  3.33K   136K  18.6M
data01  55.6G  20.4T468496  2.89M  1.82M
data01  55.6G  20.4T687  0  4.13M  0

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Mon, 29 Jun 2009, Lejun Zhu wrote:


With ZFS write throttle, the number 2.5GB is tunable. From what I've 
read in the code, it is possible to e.g. set 
zfs:zfs_write_limit_override = 0x800 (bytes) to make it write 
128M instead.


This works, and the difference in behavior is profound.  Now it is a 
matter of finding the best value which optimizes both usability and 
performance.  A tuning for 384 MB:


# echo zfs_write_limit_override/W0t402653184 | mdb -kw
zfs_write_limit_override:   0x3000  =   0x1800

CPU is smoothed out quite a lot and write latencies (as reported by a 
zio_rw.d dtrace script) are radically different than before.


Perfmeter display for 256 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png

Perfmeter display for 384 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png

Perfmeter display for 768 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Brent Jones
On Tue, Jun 30, 2009 at 12:25 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Mon, 29 Jun 2009, Lejun Zhu wrote:

 With ZFS write throttle, the number 2.5GB is tunable. From what I've read
 in the code, it is possible to e.g. set zfs:zfs_write_limit_override =
 0x800 (bytes) to make it write 128M instead.

 This works, and the difference in behavior is profound.  Now it is a matter
 of finding the best value which optimizes both usability and performance.
  A tuning for 384 MB:

 # echo zfs_write_limit_override/W0t402653184 | mdb -kw
 zfs_write_limit_override:       0x3000      =       0x1800

 CPU is smoothed out quite a lot and write latencies (as reported by a
 zio_rw.d dtrace script) are radically different than before.

 Perfmeter display for 256 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png

 Perfmeter display for 384 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png

 Perfmeter display for 768 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Maybe there could be a supported ZFS tuneable (per file system even?)
that is optimized for 'background' tasks, or 'foreground'.

Beyond that, I will give this tuneable a shot and see how it impacts
my own workload.

Thanks!

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Brent Jones wrote:


Maybe there could be a supported ZFS tuneable (per file system even?)
that is optimized for 'background' tasks, or 'foreground'.

Beyond that, I will give this tuneable a shot and see how it impacts
my own workload.


Note that this issue does not apply at all to NFS service, database 
service, or any other usage which does synchronous writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Rob Logan

 CPU is smoothed out quite a lot
yes, but the area under the CPU graph is less, so the
rate of real work performed is less, so the entire
job took longer. (allbeit smoother)

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Ross
Interesting to see that it makes such a difference, but I wonder what effect it 
has on ZFS's write ordering, and it's attempts to prevent fragmentation?

By reducing the write buffer, are you loosing those benefits?

Although on the flip side, I guess this is no worse off than any other 
filesystem, and as SSD drives take off, fragmentation is going to be less and 
less of an issue.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Rob Logan wrote:


CPU is smoothed out quite a lot

yes, but the area under the CPU graph is less, so the
rate of real work performed is less, so the entire
job took longer. (allbeit smoother)


For the purpose of illustration, the case showing the huge sawtooth 
was when running three processes at once.  The period/duration of the 
sawtooth was pretty similar, but the magnitude changes.


I agree that there is a size which provides the best balance of 
smoothness and application performance.  Probably the value should be 
dialed down to just below the point where the sawtooth occurs.


More at 11.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
 On Tue, 30 Jun 2009, Bob Friesenhahn wrote:
 
 Note that this issue does not apply at all to NFS
 service, database 
 service, or any other usage which does synchronous
 writes.

I see read starvation with NFS. I was using iometer on a Windows VM, connecting 
to an NFS mount on a 2008.11 physical box. iometer params: 65% read, 60% 
random, 8k blocks, 32 outstanding IO requests, 1 worker, 1 target.

NFS Testing
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M
data01  59.7G  20.4T 35 23   536K  2.97M
data01  59.7G  20.4T 32 23   483K  2.97M
data01  59.7G  20.4T 37 37   538K  4.70M

While writes are being committed to the ZIL all the time, periodic dumping to 
the pool still occurs, and during those times reads are starved. Maybe this 
doesn't happen in the 'real world' ?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Lejun Zhu wrote:


There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.


I have been banging away at this issue without resolution.  Based on 
Roch Bourbonnais's blog description of the ZFS write throttle code, it 
seems that I am facing a perfect storm.  Both the storage write 
bandwidth (800+ MB/second) and the memory size of my system (20 GB) 
result in the algorithm batching up 2.5 GB of user data to write. 
Since I am using mirrors, this results in 5 GB of data being written 
at full speed to the array on a very precise schedule since my 
application is processing fixed-sized files with a fixed algorithm. 
The huge writes lead to at least 3 seconds of read starvation, 
resulting in a stalled application and a square-wave of system CPU 
utilization.  I could attempt to modify my application to read ahead 
by 3 seconds but that would require gigabytes of memory, lots of 
complexity, and would not be efficient.


Richard Elling thinks that my array is pokey, but based on write speed 
and memory size, ZFS is always going to be batching up data to fill 
the write channel for 5 seconds so it does not really matter how fast 
that write channel is.  If I had 32GB of RAM and 2X the write speed, 
the situation would be identical.


Hopefully someone at Sun is indeed working this read starvation issue 
and it will be resolved soon.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Brent Jones
On Mon, Jun 29, 2009 at 2:48 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Wed, 24 Jun 2009, Lejun Zhu wrote:

 There is a bug in the database about reads blocked by writes which may be
 related:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

 The symptom is sometimes reducing queue depth makes read perform better.

 I have been banging away at this issue without resolution.  Based on Roch
 Bourbonnais's blog description of the ZFS write throttle code, it seems that
 I am facing a perfect storm.  Both the storage write bandwidth (800+
 MB/second) and the memory size of my system (20 GB) result in the algorithm
 batching up 2.5 GB of user data to write. Since I am using mirrors, this
 results in 5 GB of data being written at full speed to the array on a very
 precise schedule since my application is processing fixed-sized files with a
 fixed algorithm. The huge writes lead to at least 3 seconds of read
 starvation, resulting in a stalled application and a square-wave of system
 CPU utilization.  I could attempt to modify my application to read ahead by
 3 seconds but that would require gigabytes of memory, lots of complexity,
 and would not be efficient.

 Richard Elling thinks that my array is pokey, but based on write speed and
 memory size, ZFS is always going to be batching up data to fill the write
 channel for 5 seconds so it does not really matter how fast that write
 channel is.  If I had 32GB of RAM and 2X the write speed, the situation
 would be identical.

 Hopefully someone at Sun is indeed working this read starvation issue and it
 will be resolved soon.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I see similar square-wave performance. However, my load is primarily
write-based, when those commits happen, I see all network activity
pause while the buffer is commited to disk.
I write about 750Mbit/sec over the network to the X4540's during
backup windows using primarily iSCSI. When those writes occur to my
RaidZ volume, all activity pauses until the writes are fully flushed.

One thing to note, on 117, the effects are seemingly reduced and a bit
more even performance, but it is still there.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Lejun Zhu
 On Wed, 24 Jun 2009, Lejun Zhu wrote:
 
  There is a bug in the database about reads blocked
 by writes which may be related:
 
 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6471212
 
  The symptom is sometimes reducing queue depth makes
 read perform better.
 
 I have been banging away at this issue without
 resolution.  Based on 
 Roch Bourbonnais's blog description of the ZFS write
 throttle code, it 
 seems that I am facing a perfect storm.  Both the
 storage write 
 bandwidth (800+ MB/second) and the memory size of my
 system (20 GB) 
 result in the algorithm batching up 2.5 GB of user
 data to write. 

With ZFS write throttle, the number 2.5GB is tunable. From what I've read in 
the code, it is possible to e.g. set zfs:zfs_write_limit_override = 0x800 
(bytes) to make it write 128M instead.

 Since I am using mirrors, this results in 5 GB of
 data being written 
 at full speed to the array on a very precise schedule
 since my 
 application is processing fixed-sized files with a
 fixed algorithm. 
 The huge writes lead to at least 3 seconds of read
 starvation, 
 resulting in a stalled application and a square-wave
 of system CPU 
 utilization.  I could attempt to modify my
 application to read ahead 
 by 3 seconds but that would require gigabytes of
 memory, lots of 
 complexity, and would not be efficient.
 
 Richard Elling thinks that my array is pokey, but
 based on write speed 
 and memory size, ZFS is always going to be batching
 up data to fill 
 the write channel for 5 seconds so it does not really
 matter how fast 
 that write channel is.  If I had 32GB of RAM and 2X
 the write speed, 
 the situation would be identical.
 
 Hopefully someone at Sun is indeed working this read
 starvation issue 
 and it will be resolved soon.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/
 
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Ross
 I am not sure how zfs would know the rate of the
 underlying disk storage 

Easy:  Is the buffer growing?  :-)

If the amount of data in the buffer is growing, you need to throttle back a bit 
until the disks catch up.  Don't stop writes until the buffer is empty, just 
slow them down to match the rate at which you're clearing data from the buffer.

In your case I'd expect to see ZFS buffer the early part of the write (so you'd 
see a very quick initial burst), but from then on you would want a continual 
stream of data to disk, at a steady rate.

To the client it should respond just like storing to disk, the only difference 
is there's actually a small delay before the data hits the disk, which will be 
proportional to the buffer size.  ZFS won't have so much opportunity to 
optimize writes, but you wouldn't get such stuttering performance.

However, reading through the other messages, if it's a known bug and ZFS 
blocking reads while writing, there may not be any need for this idea.  But 
then, that bug has been open since 2006, is flagged as fix in progress, and was 
planned for snv_51 o_0.  So it probably is worth having this discussion.

And I may be completely wrong here, but reading that bug, it sounds like ZFS 
issues a whole bunch of writes at once as it clears the buffer, which ties in 
with the experiences of stalling actually being caused by reads being blocked.

I'm guessing given ZFS's aims it made sense to code it that way - if you're 
going to queue a bunch of transactions to make them efficient on disk, you 
don't want to interrupt that batch with a bunch of other (less efficient) 
reads. 

But the unintended side effect of this is that ZFS's attempt to optimize writes 
will causes jerky read and write behaviour any time you have a large amount of 
writes going on, and when you should be pushing the disks to 100% usage you're 
never going to reach that as it's always going to have 5s of inactivity, 
followed by 5s of running the disks flat out.

In fact, I wonder if it's a simple as the disks ending up doing 5s of reads, a 
delay for processing, 5s of writes, 5s of reads, etc...

It's probably efficient, but it's going to *feel* horrible, a 5s delay is 
easily noticeable by the end user, and is a deal breaker for many applications.

In situations like that, 5s is a *huge* amount of time, especially so if you're 
writing to a disk or storage device which has it's own caching!  Might it be 
possible to keep the 5s buffer for ordering transactions, but then commit that 
as a larger number of small transactions instead of one huge one?

The number of transactions could even be based on how busy the system is - if 
there are a lot of reads coming in, I'd be quite happy to split that into 50 
transactions.  On 10GbE, 5s is potentially 6.25GB of data.  Even split into 50 
transactions you're writing 128MB at a time, and that sounds plenty big enough 
to me!

Either way, something needs to be done.  If we move to ZFS our users are not 
going to be impressed with 5s delays on the storage system.

Finally, I do have one question for the ZFS guys:  How does the L2ARC interact 
with this?  Are reads from the L2ARC blocked, or will they happen in parallel 
with the writes to the main storage?  I suspect that a large L2ARC (potentially 
made up of SSD disks) would eliminate this problem the majority of the time.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Lejun Zhu wrote:


There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.


This one certainly sounds promising.  Since Matt Ahrens has been 
working on it for almost a year, it must be almost fixed by now. :-)


I am not sure how is queue depth is managed, but it seems possible to 
detect when reads are blocked by bulk writes and make some automatic 
adjustments to improve balance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Bob Friesenhahn

On Thu, 25 Jun 2009, Ross wrote:

But the unintended side effect of this is that ZFS's attempt to 
optimize writes will causes jerky read and write behaviour any time 
you have a large amount of writes going on, and when you should be 
pushing the disks to 100% usage you're never going to reach that as 
it's always going to have 5s of inactivity, followed by 5s of 
running the disks flat out.


In fact, I wonder if it's a simple as the disks ending up doing 5s 
of reads, a delay for processing, 5s of writes, 5s of reads, etc...


It's probably efficient, but it's going to *feel* horrible, a 5s 
delay is easily noticeable by the end user, and is a deal breaker 
for many applications.


Yes, 5 seconds is a long time.  For an application which mixes 
computation with I/O it is not really acceptable for read I/O to go 
away for up to 5 seconds.  This represents time that the CPU is not 
being used, and a time that the application may be unresponsive to the 
user.  When compression is used the impact is different, but the 
compression itself consumes considerable CPU (and quite abruptly) so 
that other applications (e.g. X11) stop responding during the 
compress/write cycle.


The read problem is one of congestion.  If I/O is congested with 
massive writes, then reads don't work.  It does not really matter how 
fast your storage system is.  If the 5 seconds of buffered writes are 
larger than what the device driver and storage system buffering allows 
for, then the I/O channel will be congested.


As an example, my storage array is demonstrated to be able to write 
359MB/second but ZFS will blast data from memory as fast as it can, 
and the storage path can not effectively absorb 1.8GB (359*5) of data 
since the StorageTek 2500's internal buffers are much smaller than 
that, and fiber channel device drivers are not allowed to consume much 
memory either.  To make matters worse, I am using ZFS mirrors so the 
amount of data written to the array in those five seconds is doubled 
to 3.6GB.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ethan Erchinger
  http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0
 
 Yes, this does sound very similar.  It looks to me like data from read
 files is clogging the ARC so that there is no more room for more
 writes when ZFS periodically goes to commit unwritten data.  

I'm wondering if changing txg_time to a lower value might help.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Ethan Erchinger wrote:


http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0


Yes, this does sound very similar.  It looks to me like data from read
files is clogging the ARC so that there is no more room for more
writes when ZFS periodically goes to commit unwritten data.


I'm wondering if changing txg_time to a lower value might help.


There is no doubt that having ZFS sync the written data more often 
would help.  However, it should not be necessary to tune the OS for 
such a common task as batch processing a bunch of files.


A more appropriate solution is for ZFS to notice that more than XXX 
megabytes are uncommitted, so maybe it should wake up and go write 
some data.  It is useful for ZFS to defer data writes in case the same 
file is updated many times.  In the case where the same file is 
updated many times, the total uncommitted data is still limited by the 
amount of data which is re-written and so the 30 second cycle is fine. 
In my case the amount of uncommitted data is limited by available RAM 
and how fast my application is able to produce new data to write.


The problem is very much related to how fast the data is output.  If 
the new data is created at a slower rate (output files are smaller) 
then the problem just goes away.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ross
Wouldn't it make sense for the timing technique to be used if the data is 
coming in at a rate slower than the underlying disk storage?

But then if the data starts to come at a faster rate, ZFS needs to start 
streaming to disk as quickly as it can, and instead of re-ordering writes in 
blocks, it should just do the best it can with whatever is currently in memory. 
 And when that mode activates, inbound data should be throttled to match the 
current throughput to disk.

That preserves the efficient write ordering that ZFS was originally designed 
for, but means a more graceful degradation under load, with the system tending 
towards a steady state of throughput that matches what you would expect from 
other filesystems on those physical disks.

Of course, I have no idea how difficult this is technically.  But the idea 
seems reasonable to me.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ian Collins

Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Marcelo Leal wrote:


Hello Bob,
I think that is related with my post about zio_taskq_threads and TXG 
sync :

( http://www.opensolaris.org/jive/thread.jspa?threadID=105703tstart=0 )
Roch did say that this is on top of the performance problems, and in 
the same email i did talk about the change from 5s to 30s, what i 
think makes this problem worst, if this txg sync interval be fixed.


The problem is that basing disk writes on a simple timeout and 
available memory does not work.  It is easy for an application to 
write considerable amounts of new data in 30 seconds, or even 5 
seconds.  If the application blocks while the data is being comitted, 
then the application is not performing any useful function during that 
time.


Current ZFS write behavior make it not very useful for the creative 
media industries even though otherwise it should be a perfect fit 
since hundreds of terrabytes of working disk (or even petabytes) are 
normal for this industry.  For example, when data is captured to disk 
from film via a datacine (real time = 24 files/second and 6MB to 50MB 
per file), or captured to disk from a high-definition video camera, 
there is little margin for error and blocking on writes will result in 
missed frames or other malfunction.  Current ZFS write behavior is 
based on timing and the amount of system memory and it does not seem 
that throwing more storage hardware at the problem solves anything at 
all.


I wonder whether a filesystem property streamed might be appropriate?  
This could act as hint to ZFS that the data is sequential and should be 
streamed direct to disk.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Marcelo Leal wrote:

I think that is the purpose of the current implementation: 
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle But seems 
like is not that easy... as i did understand what Roch said, seems 
like the cause is not always a hardy writer.


I see this:

The new code keeps track of the amount of data accepted in a TXG and 
the time it takes to sync. It dynamically adjusts that amount so that 
each TXG sync takes about 5 seconds (txg_time variable). It also 
clamps the limit to no more than 1/8th of physical memory.


It is interesting that it was decided that a TXG sync should take 5 
seconds by default.  That does seem to be about what I am seeing here. 
There is no mention of the devastation to the I/O channel which occurs 
if the kernel writes 5 seconds worth of data (e.g. 2GB) as fast as 
possible on a system using mirroring (2GB becomes 4GB of writes).  If 
it writes 5 seconds of data as fast as possible, then it seems that 
this blocks any opportunity to read more data so that application 
processing can continue during the TXG sync.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Thu, 25 Jun 2009, Ian Collins wrote:


I wonder whether a filesystem property streamed might be appropriate?  This 
could act as hint to ZFS that the data is sequential and should be streamed 
direct to disk.


ZFS does not seem to offer an ability to stream direct to disk other 
than perhaps via the special raw mode known to database developers.


It seems that current ZFS behavior is works as designed.  The write 
transaction time is currently tuned for 5 seconds and so it writes 
data intensely for 5 seconds while either starving the readers 
and/or blocking the writers.  Notice that by the end of TXG write, zfs 
iostat is reporting zero reads:


% zpool iostat Sun_2540 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
Sun_2540 456G  1.18T 14  0  1.86M  0
Sun_2540 456G  1.18T  0 19  0  1.47M
Sun_2540 456G  1.18T  0  3.11K  0   385M
Sun_2540 456G  1.18T  0  3.00K  0   385M
Sun_2540 456G  1.18T  0  3.34K  0   387M
Sun_2540 456G  1.18T  0  3.01K  0   386M
Sun_2540 458G  1.18T 19  1.87K  30.2K   220M
Sun_2540 458G  1.18T  0  0  0  0
Sun_2540 458G  1.18T275  0  34.4M  0
Sun_2540 458G  1.18T448  0  56.1M  0
Sun_2540 458G  1.18T468  0  58.5M  0
Sun_2540 458G  1.18T425  0  53.2M  0
Sun_2540 458G  1.18T402  0  50.4M  0
Sun_2540 458G  1.18T364  0  45.5M  0
Sun_2540 458G  1.18T339  0  42.4M  0
Sun_2540 458G  1.18T376  0  47.0M  0
Sun_2540 458G  1.18T307  0  38.5M  0
Sun_2540 458G  1.18T380  0  47.5M  0
Sun_2540 458G  1.18T148  1.35K  18.3M   117M
Sun_2540 458G  1.18T 20  3.01K  2.60M   385M
Sun_2540 458G  1.18T 15  3.00K  1.98M   384M
Sun_2540 458G  1.18T  4  3.03K   634K   388M
Sun_2540 458G  1.18T  0  3.01K  0   386M
Sun_2540 460G  1.18T142792  15.8M  82.7M
Sun_2540 460G  1.18T375  0  46.9M  0

Here is an interesting discussion thread on another list that I had 
not seen before:


http://opensolaris.org/jive/thread.jspa?messageID=347212

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Richard Elling

Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Marcelo Leal wrote:

I think that is the purpose of the current implementation: 
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle But seems 
like is not that easy... as i did understand what Roch said, seems 
like the cause is not always a hardy writer.


I see this:

The new code keeps track of the amount of data accepted in a TXG and 
the time it takes to sync. It dynamically adjusts that amount so that 
each TXG sync takes about 5 seconds (txg_time variable). It also 
clamps the limit to no more than 1/8th of physical memory.


hmmm... methinks there is a chance that the 1/8th rule might not work so 
well

for machines with lots of RAM and slow I/O.  I'm also reasonably sure that
that sort of machine is not what Sun would typically build for 
performance lab
testing, as a rule.  Hopefully Roch will comment when it is morning in 
Europe.

-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Richard Elling wrote:


The new code keeps track of the amount of data accepted in a TXG and the 
time it takes to sync. It dynamically adjusts that amount so that each TXG 
sync takes about 5 seconds (txg_time variable). It also clamps the limit to 
no more than 1/8th of physical memory.


hmmm... methinks there is a chance that the 1/8th rule might not work so well
for machines with lots of RAM and slow I/O.  I'm also reasonably sure that
that sort of machine is not what Sun would typically build for performance 
lab
testing, as a rule.  Hopefully Roch will comment when it is morning in 
Europe.


Slow I/O is relative.  If I install more memory does that make my I/O 
even slower?


I did some more testing.  I put the input data on a different drive 
and sent application output to the ZFS pool.  I no longer noticed any 
stalls in the execution even though the large ZFS flushes are taking 
place.  This proves that my application is seeing stalled reads rather 
than stalled writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Lejun Zhu
 On Wed, 24 Jun 2009, Richard Elling wrote:
  
  The new code keeps track of the amount of data
 accepted in a TXG and the 
  time it takes to sync. It dynamically adjusts that
 amount so that each TXG 
  sync takes about 5 seconds (txg_time variable). It
 also clamps the limit to 
  no more than 1/8th of physical memory.
 
  hmmm... methinks there is a chance that the 1/8th
 rule might not work so well
  for machines with lots of RAM and slow I/O.  I'm
 also reasonably sure that
  that sort of machine is not what Sun would
 typically build for performance 
  lab
  testing, as a rule.  Hopefully Roch will comment
 when it is morning in 
  Europe.
 
 Slow I/O is relative.  If I install more memory does
 that make my I/O 
 even slower?
 
 I did some more testing.  I put the input data on a
 different drive 
 and sent application output to the ZFS pool.  I no
 longer noticed any 
 stalls in the execution even though the large ZFS
 flushes are taking 
 place.  This proves that my application is seeing
 stalled reads rather 
 than stalled writes.

There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.

 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/
 
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-23 Thread milosz
is this a direct write to a zfs filesystem or is it some kind of zvol export?

anyway, sounds similar to this:

http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0

On Tue, Jun 23, 2009 at 7:14 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 It has been quite some time (about a year) since I did testing of batch
 processing with my software (GraphicsMagick).  In between time, ZFS added
 write-throttling.  I am using Solaris 10 with kernel 141415-03.

 Quite a while back I complained that ZFS was periodically stalling the
 writing process (which UFS did not do).  The ZFS write-throttling feature
 was supposed to avoid that.  In my testing today I am still seeing ZFS stall
 the writing process periodically.  When the process is stalled, there is a
 burst of disk activity, a burst of context switching, and total CPU use
 drops to almost zero. Zpool iostat says that read bandwidth is 15.8M and
 write bandwidth is 15.8M over a 60 second averaging interval.  Since my
 drive array is good for writing over 250MB/second, this is a very small
 write load and the array is loafing.

 My program uses the simple read-process-write approach.  Each file written
 (about 8MB/file) is written contiguously and written just once.  Data is
 read and written in 128K blocks.  For this application there is no value
 obtained by caching the file just written.  From what I am seeing, reading
 occurs as needed, but writes are being batched up until the next ZFS
 synchronization cycle.  During the ZFS synchronization cycle it seems that
 processes are blocked from writing. Since my system has a lot of memory and
 the ARC is capped at 10GB, quite a lot of data can be queued up to be
 written.  The ARC is currently running at its limit of 10GB.

 If I tell my software to invoke fsync() before closing each written file,
 then the stall goes away, but the program then needs to block so there is
 less beneficial use of the CPU.

 If this application stall annoys me, I am sure that it would really annoy a
 user with mission-critical work which needs to get done on a uniform basis.

 If I run this little script then the application runs more smoothly but I
 see evidence of many shorter stalls:

 while true
 do
  sleep 3
  sync
 done

 Is there a solution in the works for this problem?

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss