Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Phil Harman

Hi Banks,

Some basic stats might shed some light, e.g. vmstat 5, mpstat 5,  
iostat -xnz 5, prstat -Lmc 5 ... all running from just before you  
start the tests until things are normal again.


Memory starvation is certainly a possibility. The ARC can be greedy  
and slow to release memory under pressure.


Phil

Sent from my iPhone

On 10 Jan 2010, at 13:29, bank kus kus.b...@gmail.com wrote:


Hi Phil
You make some interesting points here:

- yes bs=1G was a lazy thing

-  the GNU cp I m using does __not__ appears to use mmap
open64 open64  read write  close close is the relevant sequence

- replacing cp with dd 128K * 64K does not help no new apps can be  
launched until the copies complete.


Regards
banks
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread bank kus
vmstat does show something interesting.  The free memory shrinks while doing 
the first dd (generating the 8G file) from around 10G  to 1.5Gish. The copy 
operations thereafter dont consume much and it stays at 1.2G after all 
operations have completed. (btw at the point of system slugishness theres 1.5G 
free RAM so that shouldnt explain the problem)

However I noticed something weird, long after the file operations are done the 
free memory doesnt seem to grow back (below) Essentially ZFS File Data claims 
to use 76% of memory long after the file has been written. How does one reclaim 
it back. Is ZFS File Data a pool that once grown to a size doesnt shrink back 
even though its current contents might not be used by any process?

 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel 234696   9167%
ZFS File Data 2384657  9315   76%
Anon   145915   5695%
Exec and libs4250160%
Page cache  28582   1111%
Free (cachelist)53147   2072%
Free (freelist)290158  11339%

Total 3141405 12271
Physical  3141404 12271
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Bob Friesenhahn

On Mon, 11 Jan 2010, bank kus wrote:


However I noticed something weird, long after the file operations 
are done the free memory doesnt seem to grow back (below) 
Essentially ZFS File Data claims to use 76% of memory long after the 
file has been written. How does one reclaim it back. Is ZFS File 
Data a pool that once grown to a size doesnt shrink back even though 
its current contents might not be used by any process?


It is normal for the ZFS ARC to retain data as long as there is not 
other memory pressure.  This should not cause a problem other than a 
small delay when starting an application which does need a lot of 
memory since the ARC will give memory back to the kernel.


For better interactive use, you can place a cap on the maximum ARC 
size via an entry in /etc/system:


  http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ARCSIZE

For example, you could set it to half your (8GB) memory so that 4GB is 
immediately available for other uses.


* Set maximum ZFS ARC size to 4GB
* http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ARCSIZE
* set zfs:zfs_arc_max = 0x1

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread bank kus
 For example, you could set it to half your (8GB) memory so that 4GB is
 immediately available for other uses.

 * Set maximum ZFS ARC size to 4GB

capping max sounds like a good idea

thanks
banks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Henrik Johansson
Hello,

On Jan 11, 2010, at 6:53 PM, bank kus wrote:

 For example, you could set it to half your (8GB) memory so that 4GB is
 immediately available for other uses.
 
 * Set maximum ZFS ARC size to 4GB
 
 capping max sounds like a good idea.


Are we still trying to solve the starvation problem?

I filed a bug on the non-ZFS related urandom stall problem yesterday, primary 
since it can do nasty things from inside a resource capped zone:
CR 6915579 solaris-cryp/random Large read from /dev/urandom can stall system

Regards
Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread bank kus
 Are we still trying to solve the starvation problem?

I would argue the disk I/O model is fundamentally broken on Solaris if there is 
no fair I/O scheduling between multiple read sources until that is fixed 
individual I_am_systemstalled_while_doing_xyz problems will crop up. Started a 
new thread focussing on just this problem.

http://opensolaris.org/jive/thread.jspa?threadID=121479tstart=0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Bob Friesenhahn

On Mon, 11 Jan 2010, bank kus wrote:


Are we still trying to solve the starvation problem?


I would argue the disk I/O model is fundamentally broken on Solaris 
if there is no fair I/O scheduling between multiple read sources 
until that is fixed individual I_am_systemstalled_while_doing_xyz 
problems will crop up. Started a new thread focussing on just this 
problem.


While I will readily agree that zfs has a I/O read starvation problem 
(which has been discussed here many times before), I doubt that it is 
due to the reasons you are thinking.


A true fair I/O scheduling model would severely hinder overall 
throughput in the same way that true real-time task scheduling 
cripples throughput.  ZFS is very much based on its ARC model.  ZFS is 
designed for maximum throughput with minimum disk accesses in server 
systems.  Most reads and writes are to and from its ARC.  Systems with 
sufficient memory hardly ever do a read from disk and so you will only 
see writes occuring in 'zpool iostat'.


The most common complaint is read stalls while zfs writes its 
transaction group, but zfs may write this data up to 30 seconds after 
the application requested the write, and the application might not 
even be running any more.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Ross Walker
On Jan 11, 2010, at 2:23 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Mon, 11 Jan 2010, bank kus wrote:


Are we still trying to solve the starvation problem?


I would argue the disk I/O model is fundamentally broken on Solaris  
if there is no fair I/O scheduling between multiple read sources  
until that is fixed individual I_am_systemstalled_while_doing_xyz  
problems will crop up. Started a new thread focussing on just this  
problem.


While I will readily agree that zfs has a I/O read starvation  
problem (which has been discussed here many times before), I doubt  
that it is due to the reasons you are thinking.


A true fair I/O scheduling model would severely hinder overall  
throughput in the same way that true real-time task scheduling  
cripples throughput.  ZFS is very much based on its ARC model.  ZFS  
is designed for maximum throughput with minimum disk accesses in  
server systems.  Most reads and writes are to and from its ARC.   
Systems with sufficient memory hardly ever do a read from disk and  
so you will only see writes occuring in 'zpool iostat'.


The most common complaint is read stalls while zfs writes its  
transaction group, but zfs may write this data up to 30 seconds  
after the application requested the write, and the application might  
not even be running any more.


Maybe an IO scheduler like Linux's 'deadline' IO scheduler whose only  
purpose is to reduce the effect of writers starving readers while  
providing some form of guaranteed latency.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Markus Kovero
Hi, it seems you might have somekind of hardware issue there, I have no way 
reproducing this.

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of bank kus
Sent: 10. tammikuuta 2010 7:21
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] I/O Read starvation

Btw FWIW if I redo the dd + 2 cp experiment on /tmp the result is far more 
disastrous. The GUI stops moving caps lock stops responding for large intervals 
no clue why.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Phil Harman
What version of Solaris / OpenSolaris are you using? Older versions use 
mmap(2) for reads in cp(1). Sadly, mmap(2) does not jive well with ZFS.


To be sure, you could check how your cp(1) is implemented using truss(1) 
(i.e. does it do mmap/write or read/write?)


aside
I find it interesting that ZFS's mmap(2) deficiencies are now dictating 
implementation of utilities which may benefit from mmap(2) on other 
filesystems. And whilst some might argue that mmap(2) is dead for file 
I/O, I think it's interesting to note that Linux appears to have a 
relatively efficient mmap(2) implementation. Sadly, this means that some 
commercial apps which are mmap(2) heavy currently perform much bettter 
on Linux than Solaris, especially ZFS. However, I doubt that Linux uses 
mmap(2) for reads in cp(1).

/aside

You could also try using dd(1) instead of cp(1).

However, it seems to me that you are using bs=1G count=8 as a lazy way 
to generate 8GB (because you don't want to do the math on smaller 
blocksizes?)


Did you know that you are asking dd(1) to do 1GB read(2) and write(2) 
systems calls using a 1GB buffer? This will cause further pressure on 
the memory system.


In performance terms, you'll probably find that block sizes beyond 128K 
add little benefit. So I'd suggest something like:


dd if=/dev/urandom of=largefile.txt bs=128k count=65536

dd if=largefile.txt of=./test/1.txt bs=128k 
dd if=largefile.txt of=./test/2.txt bs=128k 

Phil

http://harmanholistix.com



bank kus wrote:

dd if=/dev/urandom of=largefile.txt bs=1G count=8

cp largefile.txt ./test/1.txt 
cp largefile.txt ./test/2.txt 

Thats it now the system is totally unusable after launching the two 8G copies. Until these copies finish no other application is able to launch completely. Checking prstat shows them to be in the sleep state. 


Question:
 I m guessing this because ZFS doesnt use CFQ and that one process is allowed 
to queue up all its I/O reads ahead of other processes?

 Is there a concept of priority among I/O reads? I only ask because if root 
were to launch some GUI application they dont start up until both copies are done. So 
there is no concept of priority? Needless to say this does not exist on Linux 2.60...
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread bank kus
Hi Phil 
You make some interesting points here:

- yes bs=1G was a lazy thing 

-  the GNU cp I m using does __not__ appears to use mmap
open64 open64  read write  close close is the relevant sequence

- replacing cp with dd 128K * 64K does not help no new apps can be launched 
until the copies complete.

Regards
banks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Henrik Johansson
Hello again,

On Jan 10, 2010, at 5:39 AM, bank kus wrote:

 Hi Henrik
 I have 16GB Ram on my system on a lesser RAM system dd does cause problems as 
 I mentioned above. My __guess__ dd is probably sitting in some in memory 
 cache since du -sh doesnt show the full file size until I do a sync.
 
 At this point I m less looking for QA type repro questions and/or 
 speculations rather looking for  ZFS design expectations. 
 
 What is the expected behaviour, if one thread queues 100 reads  and another 
 thread comes later with 50 reads are these 50 reads __guaranteed__ to fall 
 behind the first 100 or is timeslice/fairshre done between two streams? 
 
 Btw this problem is pretty serious with 3 users using the system one of them 
 initiating a large copy grinds the other 2 to a halt. Linux doesnt have this 
 problem and this is almost a switch O/S moment for us unfortunately :-(

Have you reproduced the problem without using /dev/urandom? I can only get this 
behavior when using dd from urandom, not using files with cp, and not even 
files with dd. This could then be related the random driver spending kernel 
time in high priority threads.

So while I agree that this is not optimal, there is a huge difference in how 
bad it is, if it's urandom generated there is no problem with copying files. 
Since you also found that it's not related to ZFS (also tmpfs, and perhaps only 
urandom?) we are on the wrong list. Please isolate the problem, can we put 
aside any filesystem, if so we are on the wrong list, i've added perf-discuss 
also.

Regards

Henrik
http://sparcv9.blogspot.com


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Bob Friesenhahn

On Sun, 10 Jan 2010, Phil Harman wrote:
In performance terms, you'll probably find that block sizes beyond 128K add 
little benefit. So I'd suggest something like:


dd if=/dev/urandom of=largefile.txt bs=128k count=65536

dd if=largefile.txt of=./test/1.txt bs=128k 
dd if=largefile.txt of=./test/2.txt bs=128k 


As an interesting aside, on my Solaris 10U8 system (plus a zfs IDR), 
dd (Solaris or GNU) does not produce the expected file size when using 
/dev/urandom as input:


% /bin/dd if=/dev/urandom of=largefile.txt bs=131072 count=65536
0+65536 records in
0+65536 records out
% ls -lh largefile.txt
-rw-r--r--   1 bfriesen home 65M Jan 10 09:32 largefile.txt
% /opt/sfw/bin/dd if=/dev/urandom of=largefile.txt bs=131072 
count=65536

0+65536 records in
0+65536 records out
68157440 bytes (68 MB) copied, 1.9741 seconds, 34.5 MB/s
% ls -lh largefile.txt
-rw-r--r--   1 bfriesen home 65M Jan 10 09:33 largefile.txt
% df -h .
FilesystemSize  Used Avail Use% Mounted on
Sun_2540/zfstest/defaults
  1.2T   66M  1.2T   1% /Sun_2540/zfstest/defaults

However:
% dd if=/dev/urandom of=largefile.txt bs=1024 count=8388608
8388608+0 records in
8388608+0 records out
8589934592 bytes (8.6 GB) copied, 255.06 seconds, 33.7 MB/s
% ls -lh largefile.txt
-rw-r--r--   1 bfriesen home8.0G Jan 10 09:40 largefile.txt

% dd if=/dev/urandom of=largefile.txt bs=8192 count=1048576
0+1048576 records in
0+1048576 records out
1090519040 bytes (1.1 GB) copied, 31.8846 seconds, 34.2 MB/s

It seems that on my system dd + /dev/urandom is willing to read 1k 
blocks from /dev/urandom but with even 8K blocks, the actual blocksize 
is getting truncated down (without warning), producing much less data 
than requested.


Testing with /dev/zero produces different results:
% dd if=/dev/zero of=largefile.txt bs=8192 count=1048576
1048576+0 records in
1048576+0 records out
8589934592 bytes (8.6 GB) copied, 20.7434 seconds, 414 MB/s

WTF?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread bank kus
place a sync call after dd ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Henrik Johansson
Hello Bob,

On Jan 10, 2010, at 4:54 PM, Bob Friesenhahn wrote:

 On Sun, 10 Jan 2010, Phil Harman wrote:
 In performance terms, you'll probably find that block sizes beyond 128K add 
 little benefit. So I'd suggest something like:
 
 dd if=/dev/urandom of=largefile.txt bs=128k count=65536
 
 dd if=largefile.txt of=./test/1.txt bs=128k 
 dd if=largefile.txt of=./test/2.txt bs=128k 
 
 As an interesting aside, on my Solaris 10U8 system (plus a zfs IDR), dd 
 (Solaris or GNU) does not produce the expected file size when using 
 /dev/urandom as input:

Do you feel this is related to the filesystem, is there any difference between 
putting the data in a file on ZFS or just throwing it away? 

$(dd if=/dev/urandom of=/dev/null bs=1048576k count=16) gives me a quite 
unresponsive system too.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Bob Friesenhahn

On Sun, 10 Jan 2010, Henrik Johansson wrote:

  As an interesting aside, on my Solaris 10U8 system (plus a zfs IDR), dd 
(Solaris or GNU) does
  not produce the expected file size when using /dev/urandom as input:

Do you feel this is related to the filesystem, is there any difference between 
putting the data in a file on
ZFS or just throwing it away? 


My guess is that is due to the implementation of /dev/urandom.  It 
seems to be blocked-up at 1024 bytes and 'dd' is just using that block 
size.  It is interesting that OpenSolaris is different, and this seems 
like a bug in Solaris 10.  It seems like a new bug to me.


The /dev/random and /dev/urandom devices are rather special since 
reading from them consumes a precious resource -- entropy.  Entropy is 
created based on other activities of the system, which are expected to 
be random.  Using up all the available entropy could dramatically 
slow-down software which uses /dev/random, such as ssh or ssl.  The 
/dev/random device will completely block when the system runs out of 
entropy.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Daniel Carosone
On Sun, Jan 10, 2010 at 09:54:56AM -0600, Bob Friesenhahn wrote:
 WTF?

urandom is a character device and is returning short reads (note the
0+n vs n+0 counts). dd is not padding these out to the full blocksize
(conv=sync) or making multiple reads to fill blocks (conv=fullblock).

Evidently the urandom device changed behaviour along the way, with
regards to producing/buffering additional requested data, possibly as
a result of a changed source implementation that stretches
better/faster.  No bug here, just bad assumptions. 

--
Dan.






pgpcePztLEpd4.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Richard Elling
On Jan 8, 2010, at 7:49 PM, bank kus wrote:

 dd if=/dev/urandom of=largefile.txt bs=1G count=8
 
 cp largefile.txt ./test/1.txt 
 cp largefile.txt ./test/2.txt 
 
 Thats it now the system is totally unusable after launching the two 8G 
 copies. Until these copies finish no other application is able to launch 
 completely. Checking prstat shows them to be in the sleep state. 

What disk drivers are you using?  IDE?
 -- richard

 
 Question:
  I m guessing this because ZFS doesnt use CFQ and that one process is 
 allowed to queue up all its I/O reads ahead of other processes?
 
  Is there a concept of priority among I/O reads? I only ask because if root 
 were to launch some GUI application they dont start up until both copies are 
 done. So there is no concept of priority? Needless to say this does not exist 
 on Linux 2.60...
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Henrik Johansson



Henrik
http://sparcv9.blogspot.com

On 9 jan 2010, at 04.49, bank kus kus.b...@gmail.com wrote:


dd if=/dev/urandom of=largefile.txt bs=1G count=8

cp largefile.txt ./test/1.txt 
cp largefile.txt ./test/2.txt 

Thats it now the system is totally unusable after launching the two  
8G copies. Until these copies finish no other application is able to  
launch completely. Checking prstat shows them to be in the sleep  
state.


Question:
 I m guessing this because ZFS doesnt use CFQ and that one process  
is allowed to queue up all its I/O reads ahead of other processes?




What is CFQ, a sheduler, if you are running OpenSolaris, then you do  
not have CFQ.


 Is there a concept of priority among I/O reads? I only ask  
because if root were to launch some GUI application they dont start  
up until both copies are done. So there is no concept of priority?  
Needless to say this does not exist on Linux 2.60...

--


Probably not, but ZFS only runs in userspace on Linux with fuse so it  
will be quite different.






This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread bank kus
 Probably not, but ZFS only runs in userspace on Linux
 with fuse so it  
 will be quite different.

I wasnt clear in my description, I m referring to ext4 on Linux. In fact on a 
system with low RAM even the dd command makes the system horribly unresponsive. 

IMHO not having fairshare or timeslicing between different processes issuing 
reads is frankly unacceptable given a lame user can bring the system to a halt 
with 3 large file copies. Are there ZFS settings or Project Resource Control 
settings one can use to limit abuse from individual processes?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Bob Friesenhahn

On Sat, 9 Jan 2010, bank kus wrote:


Probably not, but ZFS only runs in userspace on Linux
with fuse so it
will be quite different.


I wasnt clear in my description, I m referring to ext4 on Linux. In 
fact on a system with low RAM even the dd command makes the system 
horribly unresponsive.


IMHO not having fairshare or timeslicing between different processes 
issuing reads is frankly unacceptable given a lame user can bring 
the system to a halt with 3 large file copies. Are there ZFS 
settings or Project Resource Control settings one can use to limit 
abuse from individual processes?


I am confused.  Are you talking about ZFS under OpenSolaris, or are 
you talking about ZFS under Linux via Fuse?


Do you have compression or deduplication enabled on the zfs 
filesystem?


What sort of system are you using?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread bank kus
 I am confused.  Are you talking about ZFS under
 OpenSolaris, or are 
 you talking about ZFS under Linux via Fuse?

??? 

 Do you have compression or deduplication enabled on
 the zfs 
 filesystem?

Compression no. I m guessing 2009.06 doesnt have dedup.
 
 What sort of system are you using?

OSOL 2009.06 on Intel i7 920. The repro steps are at the top of this thread.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Jürgen Keil
  I wasnt clear in my description, I m referring to ext4 on Linux. In 
  fact on a system with low RAM even the dd command makes the system 
  horribly unresponsive.
 
  IMHO not having fairshare or timeslicing between different processes 
  issuing reads is frankly unacceptable given a lame user can bring 
  the system to a halt with 3 large file copies. Are there ZFS 
  settings or Project Resource Control settings one can use to limit 
  abuse from individual processes?
 
 I am confused.  Are you talking about ZFS under OpenSolaris, or are 
 you talking about ZFS under Linux via Fuse?
 
 Do you have compression or deduplication enabled on
 the zfs  filesystem?
 
 What sort of system are you using?

I was able to reproduce the problem running
current (mercurial) opensolaris bits, with the
dd command:

  dd if=/dev/urandom of=largefile.txt bs=1048576k count=8

dedup is off, compression is on. System is a 32-bit laptop
with 2GB of memory, single core cpu.  The system was
unusable / unresponsive for about 5 minutes before I was
able to interrupt the dd process.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Henrik Johansson
On Jan 9, 2010, at 2:02 PM, bank kus wrote:

 Probably not, but ZFS only runs in userspace on Linux
 with fuse so it  
 will be quite different.
 
 I wasnt clear in my description, I m referring to ext4 on Linux. In fact on a 
 system with low RAM even the dd command makes the system horribly 
 unresponsive. 
 
 IMHO not having fairshare or timeslicing between different processes issuing 
 reads is frankly unacceptable given a lame user can bring the system to a 
 halt with 3 large file copies. Are there ZFS settings or Project Resource 
 Control settings one can use to limit abuse from individual processes?
 -- 

Are your sure this problem is related to ZFS? I have no problem with multiple 
threads reading and writing to my pools, it's till responsive, if I however put 
urandom with dd into the mix I get much more latency. 

Does't  for example $(dd if=/dev/urandom of=/dev/null bs=1048576k count=8) give 
you the same problem, or if you use the file you already created from urandom 
as input to dd?

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread bank kus
Hi Henrik
I have 16GB Ram on my system on a lesser RAM system dd does cause problems as I 
mentioned above. My __guess__ dd is probably sitting in some in memory cache 
since du -sh doesnt show the full file size until I do a sync.

At this point I m less looking for QA type repro questions and/or speculations 
rather looking for  ZFS design expectations. 

What is the expected behaviour, if one thread queues 100 reads  and another 
thread comes later with 50 reads are these 50 reads __guaranteed__ to fall 
behind the first 100 or is timeslice/fairshre done between two streams? 

Btw this problem is pretty serious with 3 users using the system one of them 
initiating a large copy grinds the other 2 to a halt. Linux doesnt have this 
problem and this is almost a switch O/S moment for us unfortunately :-(

Regards
banks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread bank kus
Btw FWIW if I redo the dd + 2 cp experiment on /tmp the result is far more 
disastrous. The GUI stops moving caps lock stops responding for large intervals 
no clue why.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss