Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-19 Thread Bill Sommerfeld
On Wed, 2009-06-17 at 12:35 +0200, casper@sun.com wrote:
 I still use disk swap because I have some bad experiences 
 with ZFS swap.  (ZFS appears to cache and that is very wrong)

I'm experimenting with running zfs swap with the primarycache attribute
set to metadata instead of the default all.  

aka: 

zfs set primarycache=metadata rpool/swap 

seems like that would be more likely to behave appropriately.

- Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-19 Thread Darren J Moffat

Bill Sommerfeld wrote:

On Wed, 2009-06-17 at 12:35 +0200, casper@sun.com wrote:
I still use disk swap because I have some bad experiences 
with ZFS swap.  (ZFS appears to cache and that is very wrong)


I'm experimenting with running zfs swap with the primarycache attribute
set to metadata instead of the default all.  

aka: 

	zfs set primarycache=metadata rpool/swap 


seems like that would be more likely to behave appropriately.


Agreed, and for the just incase scenario secondarycache=none - but 
then again using an SSD as swap could be interesting


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-18 Thread Haudy Kazemi

Bob Friesenhahn wrote:

On Wed, 17 Jun 2009, Haudy Kazemi wrote:

usable with very little CPU consumed.
If the system is dedicated to serving files rather than also being 
used interactively, it should not matter much what the CPU usage is.  
CPU cycles can't be stored for later use.  Ultimately, it (mostly*) 
does not matter if


Clearly you have not heard of the software flywheel:

  http://www.simplesystems.org/users/bfriesen/software_flywheel.html
I had not heard of such a device, however from the description it 
appears to be made from virtual unobtanium :)


My line of reasoning is that unused CPU cycles are to some extent a 
wasted resource, paralleling the idea that having system RAM sitting 
empty/unused is also a waste and should be used for caching until the 
system needs that RAM for other purposes (how the ZFS cache is supposed 
to work).  This isn't a perfect parallel as CPU power consumption and 
heat outlet do vary by load much more than does RAM.  I'm sure someone 
could come up with a formula for the optimal CPU loading to maximize 
energy efficiency.  There has been work on this the paper 'Dynamic Data 
Compression in Multi-hop Wireless Networks' at 
http://enl.usc.edu/~abhishek/sigmpf03-sharma.pdf .


If I understand the blog entry correctly, for text data the task took 
up to 3.5X longer to complete, and for media data, the task took about 
2.2X longer to complete with a maximum storage compression ratio of 
2.52X.


For my backup drive using lzjb compression I see a compression ratio 
of only 1.53x.


I linked to several blog posts.  It sounds like you are referring to ' 
http://blogs.sun.com/dap/entry/zfs_compression#comments '?
This blog's test results show that on their quad core platform (Sun 7410 
have quad core 2.3 ghz AMD Opteron cpus*) :
* 
http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/7410/spec


for text data, LZJB compression had negligible performance benefits 
(task times were unchanged or marginally better) and less storage space 
was consumed (1.47:1).
for media data, LZJB compression had negligible performance benefits 
(task times were unchanged or marginally worse) and storage space 
consumed was unchanged (1:1).
Take away message: as currently configured, their system has nothing to 
lose from enabling LZJB.


for text data, GZIP compression at any setting, had a significant 
negative impact on write times (CPU bound), no performance impact on 
read times, and significant positive improvements in compression ratio.
for media data, GZIP compression at any setting, had a significant 
negative impact on write times (CPU bound), no performance impact on 
read times, and marginal improvements in compression ratio.
Take away message: With GZIP as their system is currently configured, 
write performance would suffer in exchange for a higher compression 
ratio.  This may be acceptable if the system fulfills a role that has a 
read heavy usage profile of compressible content.  (An archive.org 
backend would be such an example.)  This is similar to the tradeoff made 
when comparing RAID1 or RAID10 vs RAID5.


Automatic benchmarks could be used to detect and select the optimal 
compression settings for best performance, with the basic case assuming 
the system is a dedicated file server and more advanced cases accounting 
for the CPU needs of other processes run on the same platform.  Another 
way would be to ask the administrator what the usage profile for the 
machine will be and preconfigure compression settings suitable for that 
use case.


Single and dual core systems are more likely to become CPU bound from 
enabling compression than a quad core.


All systems have bottlenecks in them somewhere by virtue of design 
decisions.  One or more of these bottlenecks will be the rate limiting 
factor for any given workload, such that even if you speed up the rest 
of the system the process will still take the same amount of time to 
complete.  The LZJB compression benchmarks on the quad core above 
demonstrate that LZJB is not the rate limiter either in writes or 
reads.  The GZIP benchmarks show that it is a rate limiter, but only 
during writes.  On a more powerful platform (6x faster CPU), GZIP writes 
may no longer be the bottleneck (assuming that the network bandwidth and 
drive I/O bandwidth remain unchanged).


System component balancing also plays a role.  If the server is 
connected via a 100 Mbps CAT5e link, and all I/O activity is from client 
computers on that link, does it make any difference if the server is 
actually capable of GZIP writes at 200 Mbps, 500 Mbps, or 1500 Mbps?  If 
the network link is later upgraded to Gigabit ethernet, now only the 
system capable of GZIPing at 1500 Mbps can keep up.  The rate limiting 
factor changes as different components are upgraded.


In many systems for many workloads, hard drive I/O bandwidth is the rate 
limiting factor that has the most significant performance impact, such 
that a 20% boost 

Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-18 Thread Bob Friesenhahn

On Thu, 18 Jun 2009, Haudy Kazemi wrote:


for text data, LZJB compression had negligible performance benefits (task 
times were unchanged or marginally better) and less storage space was 
consumed (1.47:1).
for media data, LZJB compression had negligible performance benefits (task 
times were unchanged or marginally worse) and storage space consumed was 
unchanged (1:1).
Take away message: as currently configured, their system has nothing to lose 
from enabling LZJB.


My understanding is that these tests were done with NFS and one client 
over gigabit ethernet (a file server scenario).  So in this case, the 
system is able to keep up with NFS over gigabit ethernet when LZJB is 
used.


In a stand-alone power-user desktop scenario, the situtation may be 
quite different.  In this case application CPU usage may be competing 
with storage CPU usage.  Since ZFS often defers writes, it may be that 
the compression is performed at the same time as application compute 
cycles.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Monish Shah

Hello Richard,


Monish Shah wrote:
What about when the compression is performed in dedicated hardware? 
Shouldn't compression be on by default in that case?  How do I put in an 
RFE for that?


Is there a bugs.intel.com? :-)


I may have misled you.  I'm not asking for Intel to add hardware 
compression.  Actually, we already have gzip compression boards that we have 
integrated into OpenSolaris / ZFS and they are also supported under 
NexentaStor.  What I'm saying is that if such a card is installed, 
compression should be enabled by default.



NB, Solaris already does this for encryption, which is often a more
computationally intensive operation.


Actually, compression is more compute intensive than symmetric encryption 
(such as AES).


Public key encryption, on the other hand, is horrendously compute intensive, 
much more than compression or symmectric encryption.  But, nobody uses 
public key encryption for bulk data encryption, so that doesn't apply.


Your mileage may vary.  You can always come up with compression algorithms 
that don't do a very good job of compressing, but which are light on CPU 
utilization.


Monish


I think the general cases are performed well by current hardware, and
it is already multithreaded. The bigger issue is, as Bob notes, resource
management. There is opportunity for people to work here, especially
since the community has access to large amounts of varied hardware.
Should we spin up a special interest group of some sort?
-- richard




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Kjetil Torgrim Homme
David Magda dma...@ee.ryerson.ca writes:

 On Tue, June 16, 2009 15:32, Kyle McDonald wrote:

 So the cache saves not only the time to access the disk but also
 the CPU time to decompress. Given this, I think it could be a big
 win.

 Unless you're in GIMP working on JPEGs, or doing some kind of MPEG
 video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of
 which are probably some of the largest files in most people's
 homedirs nowadays.

indeed.  I think only programmers will see any substantial benefit
from compression, since both the code itself and the object files
generated are easily compressible.

 1 GB of e-mail is a lot (probably my entire personal mail collection
 for a decade) and will compress well; 1 GB of audio files is
 nothing, and won't compress at all.

 Perhaps compressing /usr could be handy, but why bother enabling
 compression if the majority (by volume) of user data won't do
 anything but burn CPU?

 So the correct answer on whether compression should be enabled by
 default is it depends. (IMHO :) )

I'd be interested to see benchmarks on MySQL/PostgreSQL performance
with compression enabled.  my *guess* would be it isn't beneficial
since they usually do small reads and writes, and there is little gain
in reading 4 KiB instead of 8 KiB.

what other uses cases can benefit from compression?
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Fajar A. Nugraha
On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Hommekjeti...@linpro.no wrote:
 indeed.  I think only programmers will see any substantial benefit
 from compression, since both the code itself and the object files
 generated are easily compressible.

 Perhaps compressing /usr could be handy, but why bother enabling
 compression if the majority (by volume) of user data won't do
 anything but burn CPU?

How do you define substantial? My opensolaris snv_111b installation
has 1.47x compression ratio for /, with the default compression.
It's well worthed for me.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Casper . Dik

On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Hommekjeti...@linpro=
.no wrote:
 indeed. =A0I think only programmers will see any substantial benefi=
t
 from compression, since both the code itself and the object files
 generated are easily compressible.

 Perhaps compressing /usr could be handy, but why bother enabling
 compression if the majority (by volume) of user data won't do
 anything but burn CPU?

How do you define substantial? My opensolaris snv_111b installation
has 1.47x compression ratio for /, with the default compression.
It's well worthed for me.


Indeed; I've had a few systems with:

UFS (boot env 1)  UFS (boot env 2) swap

lucreate couldn't fix everything in one (old UFS) partition because of
dump and swap; with compression I can fit multiple environments (more
than two).  I still use disk swap because I have some bad experiences 
with ZFS swap.  (ZFS appears to cache and that is very wrong)

Now I use:

rpool  (using both the UFS partitions, now concatenated into one
slice) and real swap.


My ZFS/Solaris wish list is this:

- when you convert from UFS to ZFS, zpool create fails and requires
  create if; I'd like zpool create about *all* errors, not just 
  one so you know exactly what collateral damage you would do)
  has a UFS filesystem
  s2 overlaps s0
  etc

- zpool upgrade should fail if one of the available boot 
  environments doesn't support the new version (or upgrade
  to the lowest supported zfs version)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Kjetil Torgrim Homme
Fajar A. Nugraha fa...@fajar.net writes:

 Kjetil Torgrim Homme wrote:
 indeed.  I think only programmers will see any substantial benefit
 from compression, since both the code itself and the object files
 generated are easily compressible.

 Perhaps compressing /usr could be handy, but why bother enabling
 compression if the majority (by volume) of user data won't do
 anything but burn CPU?

 How do you define substantial? My opensolaris snv_111b installation
 has 1.47x compression ratio for /, with the default compression.
 It's well worthed for me.

I don't really care if my / is 5 GB or 3 GB.  how much faster is
your system operating?  what's the compression rate on your data
areas?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Monish Shah

Unless you're in GIMP working on JPEGs, or doing some kind of MPEG
video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of
which are probably some of the largest files in most people's
homedirs nowadays.


indeed.  I think only programmers will see any substantial benefit
from compression, since both the code itself and the object files
generated are easily compressible.


If we are talking about data on people's desktops and laptops, yes, it is 
not very common to see a lot of compressible data.  There will be some other 
examples, such as desktops being used for engineering drawings.  The CAD 
files do tend to be compressible and they tend to be big.


In any case, the really interesting case for compression is for business 
data (databases, e-mail servers, etc.) which tends to be quite compressible.


...


I'd be interested to see benchmarks on MySQL/PostgreSQL performance
with compression enabled.  my *guess* would be it isn't beneficial
since they usually do small reads and writes, and there is little gain
in reading 4 KiB instead of 8 KiB.


OK, now you have switched from compressibility of data to performance 
advantage.  As I said above, this kind of data usually compresses pretty 
well.


I agree that for random reads, there wouldn't be any gain from compression. 
For random writes, in a copy-on-write file system, there might be gains, 
because the blocks may be arranged in sequential fashion anyway.  We are in 
the process of doing some performance tests to prove or disprove this.


Now, if you are using SSDs for this type of workload, I'm pretty sure that 
compression will help writes.  The reason is that the flash translation 
layer in the SSD has to re-arrange the data and write it page by page.  If 
there is less data to write, there will be fewer program operations.


Given that write IOPS rating in an SSD is often much less than read IOPS, 
using compression to improve that will surely be of great value.


At this point, this is educated guesswork.  I'm going to see if I can get my 
hands on an SSD to prove this.


Monish


what other uses cases can benefit from compression?
--
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Kjetil Torgrim Homme
Monish Shah mon...@indranetworks.com writes:

 I'd be interested to see benchmarks on MySQL/PostgreSQL performance
 with compression enabled.  my *guess* would be it isn't beneficial
 since they usually do small reads and writes, and there is little
 gain in reading 4 KiB instead of 8 KiB.

 OK, now you have switched from compressibility of data to
 performance advantage.  As I said above, this kind of data usually
 compresses pretty well.

the thread has been about I/O performance since the first response, as
far as I can tell.

 I agree that for random reads, there wouldn't be any gain from
 compression. For random writes, in a copy-on-write file system,
 there might be gains, because the blocks may be arranged in
 sequential fashion anyway.  We are in the process of doing some
 performance tests to prove or disprove this.

 Now, if you are using SSDs for this type of workload, I'm pretty
 sure that compression will help writes.  The reason is that the
 flash translation layer in the SSD has to re-arrange the data and
 write it page by page.  If there is less data to write, there will
 be fewer program operations.

 Given that write IOPS rating in an SSD is often much less than read
 IOPS, using compression to improve that will surely be of great
 value.

not necessarily, since a partial SSD write is much more expensive than
a full block write (128 KiB?).  in a write intensive application, that
won't be an issue since the data is flowing steadily, but for the
right mix of random reads and writes, this may exacerbate the
bottleneck.

 At this point, this is educated guesswork.  I'm going to see if I
 can get my hands on an SSD to prove this.

that'd be great!

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread David Magda
On Wed, June 17, 2009 06:15, Fajar A. Nugraha wrote:

 Perhaps compressing /usr could be handy, but why bother enabling
 compression if the majority (by volume) of user data won't do
 anything but burn CPU?

 How do you define substantial? My opensolaris snv_111b installation
 has 1.47x compression ratio for /, with the default compression.
 It's well worthed for me.

And how many GB is that? ~1.5x is quite good, but if you're talking about
a 7.5 GB install using only 3 GB of space, but your homedir is 50 GB,
it's not a lot in relative terms.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Haudy Kazemi




Bob Friesenhahn wrote:
On Mon, 15 Jun 2009, Bob Friesenhahn wrote:
  
  
  On Mon, 15 Jun 2009, Rich Teer wrote:


You actually have that backwards. :-) In most cases, compression is
very
  
desirable. Performance studies have shown that today's CPUs can
compress
  
data faster than it takes for the uncompressed data to be read or
written.
  


Do you have a reference for such an analysis based on ZFS? I would be
interested in linear read/write performance rather than random access
synchronous access.


Perhaps you are going to make me test this for myself.

  
  
Ok, I tested this for myself on a Solaris 10 system with 4 3GHz AMD64
cores and see that we were both right. I did an iozone run with
compression and do see a performance improvement. I don't know what
the data iozone produces looks like, but it clearly must be quite
compressable. Testing was done with a 64GB file:
  
  
 KB reclen write rewrite read reread
  
uncompressed: 67108864 128 359965 354854 550869 554271
  
lzjb: 67108864 128 851336 924881 1289059 1362625
  
  
Unfortunately, during the benchmark run with lzjb the system desktop
was essentially unusable with misbehaving mouse and keyboard as well as
reported 55% CPU consumption. Without the compression the system is
fully usable with very little CPU consumed.
  

If the system is dedicated to serving files rather than also being used
interactively, it should not matter much what the CPU usage is. CPU
cycles can't be stored for later use. Ultimately, it (mostly*) does
not matter if one option consumes more CPU resources than another if
those CPU resources were otherwise going to go unused. Changes
(increases) in latencies are a consideration but probably depend more
on process scheduler choice and policies.
*Higher CPU usage will increase energy consumption, heat output, and
cooling costs...these may be important considerations in some
specialized dedicated file server applications, depending on
operational considerations.

The interactivity hit may pose a greater challenge for any other
processes/databases/virtual machines run on hardware that also serves
files. The interactivity hit may also be evidence that the process
scheduler is not fairly or effectively sharing CPU resources amongst
the running processes. If scheduler tweaks aren't effective, perhaps
dedicating a processor core(s) to interactive GUI stuff and the other
cores to filesystem duties would help smooth things out. Maybe zones
be used for that?

With a slower disk subsystem the CPU overhead would surely
be less since writing is still throttled by the disk.
  
  
It would be better to test with real data rather than iozone.
  

There are 4 sets of articles with links and snippets from their test
data below. Follow the links for the full discussion:

First article:
http://blogs.sun.com/dap/entry/zfs_compression#comments
Hardware:
Sun Storage 7000
# The server is a quad-core 7410 with 1 JBOD (configured with mirrored
storage) and 16GB of RAM. No SSD.
# The client machine is a quad-core 7410 with 128GB of DRAM.
Summary: text data set

  

  Compression
  Ratio
  Total
  Write
  Read


  off
  1.00x
  3:30
  2:08
  1:22


  lzjb
  1.47x
  3:26
  2:04
  1:22


  gzip-2
  2.35x
  6:12
  4:50
  1:22


  gzip
  2.52x
  11:18
  9:56
  1:22


  gzip-9
  2.52x
  12:16
  10:54
  1:22

  

Summary: media data set

  

  Compression
  Ratio
  Total
  Write
  Read


  off
  1.00x
  3:29
  2:07
  1:22


  lzjb
  1.00x
  3:31
  2:09
  1:22


  gzip-2
  1.01x
  6:59
  5:37
  1:22


  gzip
  1.01x
  7:18
  5:57
  1:21


  gzip-9
  1.01x
  7:37
  6:15
  1:22

  



Second article/discussion:
http://ekschi.com/technology/2009/04/28/zfs-compression-a-win-win/
http://blogs.sun.com/observatory/entry/zfs_compression_a_win_win

Third article summary:
ZFS and MySQL/InnoDB shows that gzip is often cpu-bound on current
processors; lzjb improves performance.
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/
Hardware:
SunFire
X2200 M2 w/64GB of RAM and 2 x dual-core 2.6GHz Opterons
Dell MD3000 w/15 x 15K SCSI disks and mirrored 512MB battery-backed
write caches
"Also note that this is writing to two DAS enclosures with 15 x 15K
SCSI disks apiece (28 spindles in a striped+mirrored configuration)
with 512MB of write cache apiece."


  

  TABLE1


  compression
  size
  ratio
  time


  uncompressed
  172M
  1
  0.207s


  lzjb
  79M
  2.18X
  0.234s


  gzip-1
  50M
  3.44X
  0.24s


  gzip-9
  46M
  3.73X
  0.217s

  


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Haudy Kazemi

David Magda wrote:

On Tue, June 16, 2009 15:32, Kyle McDonald wrote:

  

So the cache saves not only the time to access the disk but also the CPU
time to decompress. Given this, I think it could be a big win.



Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video
editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are
probably some of the largest files in most people's homedirs nowadays.

1 GB of e-mail is a lot (probably my entire personal mail collection for a
decade) and will compress well; 1 GB of audio files is nothing, and won't
compress at all.

Perhaps compressing /usr could be handy, but why bother enabling
compression if the majority (by volume) of user data won't do anything but
burn CPU?

So the correct answer on whether compression should be enabled by default
is it depends. (IMHO :)  )
  
The performance tests I've found almost universally show LZJB as not 
being cpu-bound on recent equipment.  A few years from now GZIP may get 
away from being cpu-bound.  As performance tests on current hardware 
show that enabling LZJB improves overall performance it would make sense 
to enable it by default.  In the future when GZIP is no longer 
cpu-bound, it might become the default (or there could be another 
algorithm).  There is a long history of previously formidable tasks 
starting out as cpu-bound but quickly progressing to an 'easily handled 
in the background' task.  Decoding MP3 and MPEG1, MPEG2 (DVD 
resolutions), softmodems (and other host signal processor devices), and 
RAID are all tasks that can easily be handled by recent equipment.


Another option/idea to consider is using LZJB as the default compression 
method, and then performing a background scrub-recompress during 
otherwise idle times. Technique ideas:
1.) A performance neutral/performance enhancing technique: use any 
algorithm that is not CPU bound on your hardware, and rarely if ever has 
worse performance than the uncompressed state
2.) Adaptive technique 1: rarely used blocks could be given the 
strongest compression (using an algorithm tuned for the data type 
detected), while frequently used blocks would be compressed at a 
performance neutral or performance improving levels.
3.) Adaptive technique 2: rarely used blocks could be given the 
strongest compression (using an algorithm tuned for the data type 
detected), while frequently used blocks would be compressed at a 
performance neutral or performance improving levels. As the storage 
device gets closer to its native capacity, start applying compression 
both proactively (to new data) and retroactively (to old data), 
progressively using more powerful compression techniques as the maximum 
native capacity is approached.  Compression could delay the users from 
reaching the 80-95% capacity point where system performance curves often 
have their knees (a massive performance degradation with each additional 
unit).
4.) Maximize space technique: detect the data type and use the best 
available algorithm for the block.


As a counterpoint, if drive capacities keep growing at their current 
pace it seems they ultimately risk obviating the need to give much 
thought to the compression algorithm, except to choose one that boosts 
system performance.  (I.e. in hard drives, compression may primarily be 
used to improve performance rather than gain extra storage space, as 
drive capacity has grown many times faster than drive performance.)


JPEGs often CAN be /losslessly/ compressed further by useful amounts 
(e.g. 25% space savings).  There is more on this here:

Tests:
http://www.maximumcompression.com/data/jpg.php
http://compression.ca/act/act-jpeg.html
http://www.downloadsquad.com/2008/09/11/winzip-12-supports-lossless-jpg-compression/
http://download.cnet.com/8301-2007_4-10038172-12.html
http://www.online-tech-tips.com/software-reviews/winzip-vs-7-zip-best-compression-method/

These have source code available:
http://sylvana.net/jpeg-ari/
PAQ8R http://www.cs.fit.edu/~mmahoney/compression/   (general info 
http://en.wikipedia.org/wiki/PAQ )


This one says source code is not yet available (implying it may become 
available):

http://www.elektronik.htw-aalen.de/packjpg/packjpg_m.htm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-17 Thread Bob Friesenhahn

On Wed, 17 Jun 2009, Haudy Kazemi wrote:

usable with very little CPU consumed.
If the system is dedicated to serving files rather than also being used 
interactively, it should not matter much what the CPU usage is.  CPU cycles 
can't be stored for later use.  Ultimately, it (mostly*) does not matter if


Clearly you have not heard of the software flywheel:

  http://www.simplesystems.org/users/bfriesen/software_flywheel.html

If I understand the blog entry correctly, for text data the task took 
up to 3.5X longer to complete, and for media data, the task took about 
2.2X longer to complete with a maximum storage compression ratio of 
2.52X.


For my backup drive using lzjb compression I see a compression ratio 
of only 1.53x.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Monish Shah

Hello,

I would like to add one more point to this.

Everyone seems to agree that compression is useful for reducing load on the 
disks and the disagreement is about the impact on CPU utilization, right?


What about when the compression is performed in dedicated hardware? 
Shouldn't compression be on by default in that case?  How do I put in an RFE 
for that?


Monish




On Mon, 15 Jun 2009, dick hoogendijk wrote:


IF at all, it certainly should not be the DEFAULT.
Compression is a choice, nothing more.


I respectfully disagree somewhat.  Yes, compression shuould be a
choice, but I think the default should be for it to be enabled.


I agree that Compression is a choice and would add :

  Compression is a choice and it is the default.

Just my feelings on the issue.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Robert Milkowski



On Mon, 15 Jun 2009, Bob Friesenhahn wrote:


On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading data the 
first time is faster.  However, Solaris is really good at caching data so any 
often-accessed data is highly likely to be cached and therefore read just one 
time.  The main point of using compression for the root pool would be so that 
the OS can fit on an abnormally small device such as a FLASH disk.  I would 
use it for a read-mostly device or an archive (backup) device.




Well, it depends on your working set and how much memory you have.
I came across systems with lots of CPU left to spare but a working set is 
much bigger than the amount of memory and enabling lzjb gave over 2x 
compression ratio and make an application to run faster.


Seen it with ldap, mysql and couple of other apps.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Bob Friesenhahn

On Mon, 15 Jun 2009, Bob Friesenhahn wrote:


On Mon, 15 Jun 2009, Rich Teer wrote:


You actually have that backwards.  :-)  In most cases, compression is very
desirable.  Performance studies have shown that today's CPUs can compress
data faster than it takes for the uncompressed data to be read or written.


Do you have a reference for such an analysis based on ZFS?  I would be 
interested in linear read/write performance rather than random access 
synchronous access.


Perhaps you are going to make me test this for myself.


Ok, I tested this for myself on a Solaris 10 system with 4 3GHz AMD64 
cores and see that we were both right.  I did an iozone run with 
compression and do see a performance improvement.  I don't know what 
the data iozone produces looks like, but it clearly must be quite 
compressable.  Testing was done with a 64GB file:


 KB  reclen   write rewritereadreread
uncompressed:  67108864 128  359965  354854   550869   554271
lzjb:  67108864 128  851336  924881  1289059  1362625

Unfortunately, during the benchmark run with lzjb the system desktop 
was essentially unusable with misbehaving mouse and keyboard as well 
as reported 55% CPU consumption.  Without the compression the system 
is fully usable with very little CPU consumed.


With a slower disk subsystem the CPU overhead would surely be less 
since writing is still throttled by the disk.


It would be better to test with real data rather than iozone.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Kyle McDonald

Bob Friesenhahn wrote:

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading data 
the first time is faster.  However, Solaris is really good at caching 
data so any often-accessed data is highly likely to be cached and 
therefore read just one time.

One thing I'm cuious about...

When reading compressed data, is it cached before or after it is 
uncompressed?


If before, then while you've save re-reading it from the disk, there is 
still (redundant) overhead for uncompressing it over and over.


If the uncompressed data is cached, then I agree it sounds like a total 
win for read-mostly filesystems.


  -Kyle

  The main point of using compression for the root pool would be so 
that the OS can fit on an abnormally small device such as a FLASH 
disk.  I would use it for a read-mostly device or an archive (backup) 
device.


On desktop systems the influence of compression on desktop response is 
quite noticeable when writing, even with very fast CPUs and multiple 
cores.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Darren J Moffat

Kyle McDonald wrote:

Bob Friesenhahn wrote:

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading data 
the first time is faster.  However, Solaris is really good at caching 
data so any often-accessed data is highly likely to be cached and 
therefore read just one time.

One thing I'm cuious about...

When reading compressed data, is it cached before or after it is 
uncompressed?


The decompressed (and decrypted) data is what is cached in memory.

Currently the L2ARC stores decompressed (but encrypted) data on the 
cache devices.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Richard Elling

Monish Shah wrote:

Hello,

I would like to add one more point to this.

Everyone seems to agree that compression is useful for reducing load 
on the disks and the disagreement is about the impact on CPU 
utilization, right?


What about when the compression is performed in dedicated hardware? 
Shouldn't compression be on by default in that case?  How do I put in 
an RFE for that?


Is there a bugs.intel.com? :-)

NB, Solaris already does this for encryption, which is often a more
computationally intensive operation.

I think the general cases are performed well by current hardware, and
it is already multithreaded. The bigger issue is, as Bob notes, resource
management. There is opportunity for people to work here, especially
since the community has access to large amounts of varied hardware.
Should we spin up a special interest group of some sort?
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Kyle McDonald

Darren J Moffat wrote:

Kyle McDonald wrote:

Bob Friesenhahn wrote:

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that 
it was

faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But 
sure, it

uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading 
data the first time is faster.  However, Solaris is really good at 
caching data so any often-accessed data is highly likely to be 
cached and therefore read just one time.

One thing I'm cuious about...

When reading compressed data, is it cached before or after it is 
uncompressed?


The decompressed (and decrypted) data is what is cached in memory.

Currently the L2ARC stores decompressed (but encrypted) data on the 
cache devices.


So the cache saves not only the time to access the disk but also the CPU 
time to decompress. Given this, I think it could be a big win.


 -Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread David Magda
On Tue, June 16, 2009 15:32, Kyle McDonald wrote:

 So the cache saves not only the time to access the disk but also the CPU
 time to decompress. Given this, I think it could be a big win.

Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video
editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are
probably some of the largest files in most people's homedirs nowadays.

1 GB of e-mail is a lot (probably my entire personal mail collection for a
decade) and will compress well; 1 GB of audio files is nothing, and won't
compress at all.

Perhaps compressing /usr could be handy, but why bother enabling
compression if the majority (by volume) of user data won't do anything but
burn CPU?

So the correct answer on whether compression should be enabled by default
is it depends. (IMHO :)  )

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Thommy M.
Bob Friesenhahn wrote:
 On Mon, 15 Jun 2009, Shannon Fiume wrote:
 
 I just installed 2009.06 and found that compression isn't enabled by
 default when filesystems are created. Does is make sense to have an
 RFE open for this? (I'll open one tonight if need be.) We keep telling
 people to turn on compression. Are there any situations where turning
 on compression doesn't make sense, like rpool/swap? what about
 rpool/dump?
 
 In most cases compression is not desireable.  It consumes CPU and
 results in uneven system performance.

IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Glenn Lagasse
* Shannon Fiume (shannon.fi...@sun.com) wrote:
 Hi,
 
 I just installed 2009.06 and found that compression isn't enabled by
 default when filesystems are created. Does is make sense to have an
 RFE open for this? (I'll open one tonight if need be.) We keep telling
 people to turn on compression. Are there any situations where turning
 on compression doesn't make sense, like rpool/swap? what about
 rpool/dump?

That would be enhancement request #86.

http://defect.opensolaris.org/bz/show_bug.cgi?id=86

Cheers,

-- 
Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread dick hoogendijk
On Mon, 15 Jun 2009 22:51:12 +0200
Thommy M. thommy.m.malmst...@gmail.com wrote:

 IIRC there was a blog about I/O performance with ZFS stating that it
 was faster with compression ON as it didn't have to wait for so much
 data from the disks and that the CPU was fast at unpacking data. But
 sure, it uses more CPU (and probably memory).

IF at all, it certainly should not be the DEFAULT.
Compression is a choice, nothing more.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | nevada / OpenSolaris 2009.06 release
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Rich Teer
On Mon, 15 Jun 2009, dick hoogendijk wrote:

 IF at all, it certainly should not be the DEFAULT.
 Compression is a choice, nothing more.

I respectfully disagree somewhat.  Yes, compression shuould be a
choice, but I think the default should be for it to be enabled.

-- 
Rich Teer, SCSA, SCNA, SCSECA

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Bob Friesenhahn

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading data 
the first time is faster.  However, Solaris is really good at caching 
data so any often-accessed data is highly likely to be cached and 
therefore read just one time.  The main point of using compression for 
the root pool would be so that the OS can fit on an abnormally small 
device such as a FLASH disk.  I would use it for a read-mostly device 
or an archive (backup) device.


On desktop systems the influence of compression on desktop response is 
quite noticeable when writing, even with very fast CPUs and multiple 
cores.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Rich Teer
On Mon, 15 Jun 2009, Bob Friesenhahn wrote:

 In most cases compression is not desireable.  It consumes CPU and results in
 uneven system performance.

You actually have that backwards.  :-)  In most cases, compression is very
desirable.  Performance studies have shown that today's CPUs can compress
data faster than it takes for the uncompressed data to be read or written.
That is, the time to read or write compressed data + the time to compress
or decompress it is less than the time read or write the uncompressed data.

Such is the difference between CPUs and I/O!

You are correct that the compression/decompression uses CPU, but most systems
have an abundance of CPU, especially when performing I/O.

-- 
Rich Teer, SCSA, SCNA, SCSECA

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Dennis Clarke

 On Mon, 15 Jun 2009, dick hoogendijk wrote:

 IF at all, it certainly should not be the DEFAULT.
 Compression is a choice, nothing more.

 I respectfully disagree somewhat.  Yes, compression shuould be a
 choice, but I think the default should be for it to be enabled.

I agree that Compression is a choice and would add :

   Compression is a choice and it is the default.

Just my feelings on the issue.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Bob Friesenhahn

On Mon, 15 Jun 2009, Rich Teer wrote:


You actually have that backwards.  :-)  In most cases, compression is very
desirable.  Performance studies have shown that today's CPUs can compress
data faster than it takes for the uncompressed data to be read or written.


Do you have a reference for such an analysis based on ZFS?  I would be 
interested in linear read/write performance rather than random access 
synchronous access.


Perhaps you are going to make me test this for myself.


You are correct that the compression/decompression uses CPU, but most systems
have an abundance of CPU, especially when performing I/O.


I assume that you are talking about single-user systems with little 
else to do?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss