[sqlite] RE: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance
Hi Teg, Presumably you have a transaction in place around the whole of your inserts and that you have the PRAGMA synchronous = OFF; set. Have you looked at perhaps not creating the database on the server, but merely creating the INSERT statements in one big file that you compress and send down to the client, who then decompresses and runs the inserts? You could even abbreviate the insert statements but I've always found (possibly because the indices don't compress well) that compressing the source of a database gets you a much smaller payload than compressing the finished database. Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] org] On Behalf Of Teg Sent: 07 February 2006 15:40 To: Andrew Piskorski Subject: [RBL] Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance Hello Andrew, My purpose is primarily disk storage savings, the data's mainly text so it's highly compressible. 500K on disk chunks of data decompress out to about 8 megabytes of text. What compression scheme do they use? I might consider trading some disk space for faster compression/decompression. C Tuesday, February 7, 2006, 10:26:02 AM, you wrote: AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote: >> My application uses compressed data (gzip) but, the tradeoff to small >> data files is exceptionally heavy CPU usage when the data is >> decompressed/compressed. AP> Incidentally, the MonetDB folks have done research on that sort of AP> thing. In their most recent project, "X100", they keep the data AP> compressed both on disk AND in main memory, and decompress it only in AP> the CPU cache when actually manipulating values. AP> They do that not primarily to save disk space, but to reduce the AP> amount of memory bandwith needed. Apparently in some cases it's a big AP> speed-up, and shifts the query from being memory I/O bound to CPU AP> bound. Of course, in order for that to work they have to use very AP> lightweight compression/decompression algorithms. Gzip gives much AP> better compression, but in comparison it's extremely slow. AP> Probably not immediately useful, but it seems like interesting stuff: AP> http://monetdb.cwi.nl/ AP> http://homepages.cwi.nl/~mk/MonetDB/ AP> http://sourceforge.net/projects/monetdb/ AP> http://homepages.cwi.nl/~boncz/ AP> "MonetDB/X100 - A DBMS In The CPU Cache" AP> by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan AP> ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm AP> Btw, apparently the current stable version of MonetDB is open source AP> but they haven't decided whether the X100 work will be or not. AP> Googling just now, there seems to have been a fair amount of research AP> and commercialization of this sort of stuff lately, e.g.: AP> http://db.csail.mit.edu/projects/cstore/ -- Best regards, Tegmailto:[EMAIL PROTECTED]
Re[2]: [sqlite] R: [sqlite] Snapshot database creation performance
Hello Andrew, My purpose is primarily disk storage savings, the data's mainly text so it's highly compressible. 500K on disk chunks of data decompress out to about 8 megabytes of text. What compression scheme do they use? I might consider trading some disk space for faster compression/decompression. C Tuesday, February 7, 2006, 10:26:02 AM, you wrote: AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote: >> My application uses compressed data (gzip) but, the tradeoff to small >> data files is exceptionally heavy CPU usage when the data is >> decompressed/compressed. AP> Incidentally, the MonetDB folks have done research on that sort of AP> thing. In their most recent project, "X100", they keep the data AP> compressed both on disk AND in main memory, and decompress it only in AP> the CPU cache when actually manipulating values. AP> They do that not primarily to save disk space, but to reduce the AP> amount of memory bandwith needed. Apparently in some cases it's a big AP> speed-up, and shifts the query from being memory I/O bound to CPU AP> bound. Of course, in order for that to work they have to use very AP> lightweight compression/decompression algorithms. Gzip gives much AP> better compression, but in comparison it's extremely slow. AP> Probably not immediately useful, but it seems like interesting stuff: AP> http://monetdb.cwi.nl/ AP> http://homepages.cwi.nl/~mk/MonetDB/ AP> http://sourceforge.net/projects/monetdb/ AP> http://homepages.cwi.nl/~boncz/ AP> "MonetDB/X100 - A DBMS In The CPU Cache" AP> by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan AP> ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm AP> Btw, apparently the current stable version of MonetDB is open source AP> but they haven't decided whether the X100 work will be or not. AP> Googling just now, there seems to have been a fair amount of research AP> and commercialization of this sort of stuff lately, e.g.: AP> http://db.csail.mit.edu/projects/cstore/ -- Best regards, Tegmailto:[EMAIL PROTECTED]
Re: [sqlite] R: [sqlite] Snapshot database creation performance
On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote: > My application uses compressed data (gzip) but, the tradeoff to small > data files is exceptionally heavy CPU usage when the data is > decompressed/compressed. Incidentally, the MonetDB folks have done research on that sort of thing. In their most recent project, "X100", they keep the data compressed both on disk AND in main memory, and decompress it only in the CPU cache when actually manipulating values. They do that not primarily to save disk space, but to reduce the amount of memory bandwith needed. Apparently in some cases it's a big speed-up, and shifts the query from being memory I/O bound to CPU bound. Of course, in order for that to work they have to use very lightweight compression/decompression algorithms. Gzip gives much better compression, but in comparison it's extremely slow. Probably not immediately useful, but it seems like interesting stuff: http://monetdb.cwi.nl/ http://homepages.cwi.nl/~mk/MonetDB/ http://sourceforge.net/projects/monetdb/ http://homepages.cwi.nl/~boncz/ "MonetDB/X100 - A DBMS In The CPU Cache" by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm Btw, apparently the current stable version of MonetDB is open source but they haven't decided whether the X100 work will be or not. Googling just now, there seems to have been a fair amount of research and commercialization of this sort of stuff lately, e.g.: http://db.csail.mit.edu/projects/cstore/ -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/
Re: [sqlite] R: [sqlite] Snapshot database creation performance
Hello Michele, Perhaps replacing the "os_win.c" (or whatever your OS) with functions the simulate disk IO through your compressed stream functions. The biggest problem I see is random access. Typically there is no quick seeking within compressed data, you just have to "Read" to the point you want and then do this repeatedly every time you seek. Why can't you use a plain "memory" database? My application uses compressed data (gzip) but, the tradeoff to small data files is exceptionally heavy CPU usage when the data is decompressed/compressed. The datafiles are sized so, seeking isn't too painful. C Tuesday, February 7, 2006, 8:09:47 AM, you wrote: CM> You understood perfectly. CM> The ramdisk is not the goal, but just a mean to obtain the CM> 'best performance' possible goal. CM> I don't need the ramdisk at all, just a 'all in memory processing'. CM> The solution to this problem should help many users because I CM> think we are many that use a SQLite database to store data in CM> offline mode. CM> Bye, Michele CM> -Messaggio originale- CM> Da: Andrew Piskorski [mailto:[EMAIL PROTECTED] CM> Inviato: martedì, 7. febbraio 2006 13:54 CM> A: sqlite-users@sqlite.org CM> Oggetto: Re: [sqlite] Snapshot database creation performance CM> On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote: >> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote: >> > My program is written in .Net and the compression routines I'm using are >> > stream based, so I need to create a memory stream from the internal >> > buffers that can be used by the compression routine... >> >> Do you mean you wish to directly access SQLite's in-memory data >> structures, rather than using a SQL query to get the data? Why? CM> Oh, never mind, I wrote too soon. I realize now that you want an CM> in-memory representation of a SQLite database file, which you than CM> gzip or whatever, all still in memory, and then transfer over the CM> network to whomever wants that file. And you want to know if there's CM> some way to do that which offers more programatic control than the CM> ramdisk filesystem you're using now. CM> Essentially, you want a way to dynamically control the size of the RAM CM> disk. But it sounds like you don't necessarily need ALL the CM> facilities of a normal file-system, so you're wondering if perhaps you CM> could implement something more purpose-specific yourself rather than CM> using that ramdisk driver. CM> Interesting, but I don't know the answer. -- Best regards, Tegmailto:[EMAIL PROTECTED]
[sqlite] R: [sqlite] Snapshot database creation performance
You understood perfectly. The ramdisk is not the goal, but just a mean to obtain the 'best performance' possible goal. I don't need the ramdisk at all, just a 'all in memory processing'. The solution to this problem should help many users because I think we are many that use a SQLite database to store data in offline mode. Bye, Michele -Messaggio originale- Da: Andrew Piskorski [mailto:[EMAIL PROTECTED] Inviato: martedì, 7. febbraio 2006 13:54 A: sqlite-users@sqlite.org Oggetto: Re: [sqlite] Snapshot database creation performance On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote: > On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote: > > My program is written in .Net and the compression routines I'm using are > > stream based, so I need to create a memory stream from the internal > > buffers that can be used by the compression routine... > > Do you mean you wish to directly access SQLite's in-memory data > structures, rather than using a SQL query to get the data? Why? Oh, never mind, I wrote too soon. I realize now that you want an in-memory representation of a SQLite database file, which you than gzip or whatever, all still in memory, and then transfer over the network to whomever wants that file. And you want to know if there's some way to do that which offers more programatic control than the ramdisk filesystem you're using now. Essentially, you want a way to dynamically control the size of the RAM disk. But it sounds like you don't necessarily need ALL the facilities of a normal file-system, so you're wondering if perhaps you could implement something more purpose-specific yourself rather than using that ramdisk driver. Interesting, but I don't know the answer. -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/