[sqlite] RE: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Steve O'Hara
Hi Teg,

Presumably you have a transaction in place around the whole of your
inserts and that you have the PRAGMA synchronous = OFF; set.

Have you looked at perhaps not creating the database on the server, but
merely creating the INSERT statements in one big file that you compress
and send down to the client, who then decompresses and runs the inserts?
You could even abbreviate the insert statements but I've always found
(possibly because the indices don't compress well) that compressing the
source of a database gets you a much smaller payload than compressing
the finished database.

Steve


-Original Message-
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
org] On Behalf Of Teg
Sent: 07 February 2006 15:40
To: Andrew Piskorski
Subject: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation
performance

Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only
in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a
big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting
stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of
research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




-- 
Best regards,
 Tegmailto:[EMAIL PROTECTED]





Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Teg
Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




-- 
Best regards,
 Tegmailto:[EMAIL PROTECTED]



Re: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Andrew Piskorski
On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

> My application uses compressed data (gzip) but, the tradeoff to small
> data files is exceptionally heavy CPU usage when the data is
> decompressed/compressed.

Incidentally, the MonetDB folks have done research on that sort of
thing.  In their most recent project, "X100", they keep the data
compressed both on disk AND in main memory, and decompress it only in
the CPU cache when actually manipulating values.

They do that not primarily to save disk space, but to reduce the
amount of memory bandwith needed.  Apparently in some cases it's a big
speed-up, and shifts the query from being memory I/O bound to CPU
bound.  Of course, in order for that to work they have to use very
lightweight compression/decompression algorithms.  Gzip gives much
better compression, but in comparison it's extremely slow.

Probably not immediately useful, but it seems like interesting stuff:

  http://monetdb.cwi.nl/
  http://homepages.cwi.nl/~mk/MonetDB/
  http://sourceforge.net/projects/monetdb/
  http://homepages.cwi.nl/~boncz/

  "MonetDB/X100 - A DBMS In The CPU Cache"
  by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
  ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

Btw, apparently the current stable version of MonetDB is open source
but they haven't decided whether the X100 work will be or not.

Googling just now, there seems to have been a fair amount of research
and commercialization of this sort of stuff lately, e.g.:

  http://db.csail.mit.edu/projects/cstore/

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/


Re: [sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Teg
Hello Michele,

Perhaps replacing the "os_win.c" (or whatever your OS) with functions
the simulate disk IO through your compressed stream functions. The
biggest problem I see is random access. Typically there is no quick
seeking within compressed data, you just have to "Read" to the point
you want and then do this repeatedly every time you seek. Why can't
you use a plain "memory" database?

My application uses compressed data (gzip) but, the tradeoff to small
data files is exceptionally heavy CPU usage when the data is
decompressed/compressed. The datafiles are sized so, seeking isn't too
painful.

C

Tuesday, February 7, 2006, 8:09:47 AM, you wrote:

CM> You understood perfectly.
CM> The ramdisk is not the goal, but just a mean to obtain the
CM> 'best performance' possible goal.

CM> I don't need the ramdisk at all, just a 'all in memory processing'.
CM> The solution to this problem should help many users because I
CM> think we are many that use a SQLite database to store data in
CM> offline mode.

CM> Bye, Michele
CM> -Messaggio originale-
CM> Da: Andrew Piskorski [mailto:[EMAIL PROTECTED] 
CM> Inviato: martedì, 7. febbraio 2006 13:54
CM> A: sqlite-users@sqlite.org
CM> Oggetto: Re: [sqlite] Snapshot database creation performance

CM> On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
>> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

>> > My program is written in .Net and the compression routines I'm using are
>> > stream based, so I need to create a memory stream from the internal
>> > buffers that can be used by the compression routine...
>> 
>> Do you mean you wish to directly access SQLite's in-memory data
>> structures, rather than using a SQL query to get the data?  Why?

CM> Oh, never mind, I wrote too soon.  I realize now that you want an
CM> in-memory representation of a SQLite database file, which you than
CM> gzip or whatever, all still in memory, and then transfer over the
CM> network to whomever wants that file.  And you want to know if there's
CM> some way to do that which offers more programatic control than the
CM> ramdisk filesystem you're using now.

CM> Essentially, you want a way to dynamically control the size of the RAM
CM> disk.  But it sounds like you don't necessarily need ALL the
CM> facilities of a normal file-system, so you're wondering if perhaps you
CM> could implement something more purpose-specific yourself rather than
CM> using that ramdisk driver.

CM> Interesting, but I don't know the answer.




-- 
Best regards,
 Tegmailto:[EMAIL PROTECTED]



[sqlite] R: [sqlite] Snapshot database creation performance

2006-02-07 Thread Clinco, Michele
You understood perfectly.
The ramdisk is not the goal, but just a mean to obtain the 'best performance' 
possible goal.

I don't need the ramdisk at all, just a 'all in memory processing'.
The solution to this problem should help many users because I think we are many 
that use a SQLite database to store data in offline mode.

Bye, Michele
-Messaggio originale-
Da: Andrew Piskorski [mailto:[EMAIL PROTECTED] 
Inviato: martedì, 7. febbraio 2006 13:54
A: sqlite-users@sqlite.org
Oggetto: Re: [sqlite] Snapshot database creation performance

On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

> > My program is written in .Net and the compression routines I'm using are
> > stream based, so I need to create a memory stream from the internal
> > buffers that can be used by the compression routine...
> 
> Do you mean you wish to directly access SQLite's in-memory data
> structures, rather than using a SQL query to get the data?  Why?

Oh, never mind, I wrote too soon.  I realize now that you want an
in-memory representation of a SQLite database file, which you than
gzip or whatever, all still in memory, and then transfer over the
network to whomever wants that file.  And you want to know if there's
some way to do that which offers more programatic control than the
ramdisk filesystem you're using now.

Essentially, you want a way to dynamically control the size of the RAM
disk.  But it sounds like you don't necessarily need ALL the
facilities of a normal file-system, so you're wondering if perhaps you
could implement something more purpose-specific yourself rather than
using that ramdisk driver.

Interesting, but I don't know the answer.

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/