On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

> My application uses compressed data (gzip) but, the tradeoff to small
> data files is exceptionally heavy CPU usage when the data is
> decompressed/compressed.

Incidentally, the MonetDB folks have done research on that sort of
thing.  In their most recent project, "X100", they keep the data
compressed both on disk AND in main memory, and decompress it only in
the CPU cache when actually manipulating values.

They do that not primarily to save disk space, but to reduce the
amount of memory bandwith needed.  Apparently in some cases it's a big
speed-up, and shifts the query from being memory I/O bound to CPU
bound.  Of course, in order for that to work they have to use very
lightweight compression/decompression algorithms.  Gzip gives much
better compression, but in comparison it's extremely slow.

Probably not immediately useful, but it seems like interesting stuff:

  http://monetdb.cwi.nl/
  http://homepages.cwi.nl/~mk/MonetDB/
  http://sourceforge.net/projects/monetdb/
  http://homepages.cwi.nl/~boncz/

  "MonetDB/X100 - A DBMS In The CPU Cache"
  by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
  ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

Btw, apparently the current stable version of MonetDB is open source
but they haven't decided whether the X100 work will be or not.

Googling just now, there seems to have been a fair amount of research
and commercialization of this sort of stuff lately, e.g.:

  http://db.csail.mit.edu/projects/cstore/

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

Reply via email to