Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ray Clark Sun, 30 Nov 2008 11:47:30 -0800

> > I think Chris has the right idea. This would give more little opportunities 
> > for user
> > processes to get a word in edgewise. Since the blocks are *obviously* 
> > taking a
> > LONG time, this would not be a big hit efficiency in the bogged-down 
> > condition.


> I still think you are expecting too much of a P3 system with limited
> RAM. I chose not to use gzip (default compression) on a max'd out x4540
> because it slowed down zfs receive too much.

---
This is not about getting my P3 to do gzip-9 at 100Mbit wire speeds.  I know 
that is not going to happen.  

This is about not having kernel threads completely lock out user (and other 
kernel) processes for undesirable lengths of time.  It is about improving 
Solaris.  It is about having more appropriate CPU sharing between all of the 
threads in the system, kernel and user. This is the root cause of the 
pathological behavior I stumbled on.

To clarify, (1) This started as an experiment to see what compression ratio 
would result, (2) To see what the performance hit would be, and (3) To stress 
the system severely to expose problems such as exposed critical sections of 
code, race conditions, etc. to give myself confidence in using 2008.11.  I did 
not expect to find that it performed well.  I did not expect to decide to use 
gzip-9 on this machine.

The experiment / exercise turned into a concern regarding the reliability of 
Solaris and ZFS as a platform based on the gradual depredation to 100KB/Sec and 
completely unresponsive console (I understated it, at times it took 10-20 
minutes to respond).  That triggered this thread.  

This thread is NOT about throughput of a gzip-9 zfs system.  It is about a 
Solaris ZFS system becoming completely, 99.999% unresponsive, indistinguishable 
from crashed.  No doubt I will put some effort into seeing if I can boost 
throughput a little, but right now my primary concern is that it WORKS.

This discussion has served to enable me to go away with confidence in Solaris 
and ZFS despite the pathological behavior of the gzip-9 algorithm and its 
interaction with the ZFS thread scheduling.  The copy completed successfully 
last night.  (1) It still functions correctly even with the problems, and I 
will not loose data.  It is NOT a code correctness problem that could under the 
right conditions and random chance result in data loss even without gzip.   (2) 
I can completely avoid it by not doing compression, especially gzip-9 
compression.  It is also comforting to know that the pathological behavior will 
be eliminated by an improvement in zfs thread scheduling.  This will leave only 
the intrinsic poor performance of gzip-9.

I do expect (Though many I gather will disagree) that I will have a reliable, 
predictable, serviceable if low-performance Solaris/ZFS file server based on an 
800MHz P3 with 768MB of memory, without compression.  I can deal with slow, I 
can't deal with crashed or data loss.  I don't think that is an unreasonable 
expectation.

The discussion of how to improve the zfs kernel thread's scheduling I believe 
has value regardless of gzip-9.  It is a latent problem, a poor design the way 
it is now.  Jeff has said that it will be fixed.

The dead-idle system running gnome is a little jerky vs. smooth as silk, I 
expect due to the same root-case.  This will be good to fix, as it gives a 
pretty bad impression of Solaris when Linux can run silky-smooth and responsive.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Reply via email to