[sqlite] RE: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Steve O'Hara
Hi Teg,

Presumably you have a transaction in place around the whole of your
inserts and that you have the PRAGMA synchronous = OFF; set.

Have you looked at perhaps not creating the database on the server, but
merely creating the INSERT statements in one big file that you compress
and send down to the client, who then decompresses and runs the inserts?
You could even abbreviate the insert statements but I've always found
(possibly because the indices don't compress well) that compressing the
source of a database gets you a much smaller payload than compressing
the finished database.

Steve


-Original Message-
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
org] On Behalf Of Teg
Sent: 07 February 2006 15:40
To: Andrew Piskorski
Subject: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation
performance

Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only
in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a
big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting
stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of
research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




-- 
Best regards,
 Tegmailto:[EMAIL PROTECTED]





Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Teg
Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




-- 
Best regards,
 Tegmailto:[EMAIL PROTECTED]