Hello Andrew, My purpose is primarily disk storage savings, the data's mainly text so it's highly compressible. 500K on disk chunks of data decompress out to about 8 megabytes of text. What compression scheme do they use? I might consider trading some disk space for faster compression/decompression.
C Tuesday, February 7, 2006, 10:26:02 AM, you wrote: AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote: >> My application uses compressed data (gzip) but, the tradeoff to small >> data files is exceptionally heavy CPU usage when the data is >> decompressed/compressed. AP> Incidentally, the MonetDB folks have done research on that sort of AP> thing. In their most recent project, "X100", they keep the data AP> compressed both on disk AND in main memory, and decompress it only in AP> the CPU cache when actually manipulating values. AP> They do that not primarily to save disk space, but to reduce the AP> amount of memory bandwith needed. Apparently in some cases it's a big AP> speed-up, and shifts the query from being memory I/O bound to CPU AP> bound. Of course, in order for that to work they have to use very AP> lightweight compression/decompression algorithms. Gzip gives much AP> better compression, but in comparison it's extremely slow. AP> Probably not immediately useful, but it seems like interesting stuff: AP> http://monetdb.cwi.nl/ AP> http://homepages.cwi.nl/~mk/MonetDB/ AP> http://sourceforge.net/projects/monetdb/ AP> http://homepages.cwi.nl/~boncz/ AP> "MonetDB/X100 - A DBMS In The CPU Cache" AP> by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan AP> ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm AP> Btw, apparently the current stable version of MonetDB is open source AP> but they haven't decided whether the X100 work will be or not. AP> Googling just now, there seems to have been a fair amount of research AP> and commercialization of this sort of stuff lately, e.g.: AP> http://db.csail.mit.edu/projects/cstore/ -- Best regards, Teg mailto:[EMAIL PROTECTED]