Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




-- 
Best regards,
 Teg                            mailto:[EMAIL PROTECTED]

Reply via email to