[sqlite] Best way to temporarily store data before processing
Thanks for all the responses. In the end I went with the CREATE TEMP TABLE ... option which has worked very well and seems to have negligible overhead (on the order of 0.006s to create ~40 temporary tables and similarly quick to drop them). -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Petite Abeille Sent: Wednesday, April 15, 2015 6:24 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Best way to temporarily store data before processing > On Apr 14, 2015, at 11:40 AM, Jonathan Moules > wrote: > > Options that have come to mind (probably missed a lot): I personally use temp tables, e.g. 'create temporary table if not exists foo?, coupled with 'pragma temp_store = memory?, and drop/create them as necessary, e.g. 'drop table if exists?. There is a set of ETL (Extract, Transform and Load) operations to massage the data just so to their final resting place. Will not qualify this as ?best?, but it?s very workable and without much of a headache. As always, your milage may vary. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This message has been scanned for viruses by MailControl - www.mailcontrol.com Click https://www.mailcontrol.com/sr/qYyWV6h9JtHGX2PQPOmvUj!GOBh06pKK75cHS8CBt56rbyGA4ThgbelonLeAq9lKW2cQtZ6+iOpzr!jqbv!p6A== to report this email as spam. HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099
[sqlite] Best way to temporarily store data before processing
> On Apr 14, 2015, at 11:40 AM, Jonathan Moules > wrote: > > Options that have come to mind (probably missed a lot): I personally use temp tables, e.g. 'create temporary table if not exists foo?, coupled with 'pragma temp_store = memory?, and drop/create them as necessary, e.g. 'drop table if exists?. There is a set of ETL (Extract, Transform and Load) operations to massage the data just so to their final resting place. Will not qualify this as ?best?, but it?s very workable and without much of a headache. As always, your milage may vary.
[sqlite] Best way to temporarily store data before processing
On Tue, Apr 14, 2015 at 1:10 PM, David Cotter wrote: > does this awesome sounding temp table then merge instantly when you commit it > even if it?s huge? Since I have no idea what "merge" means in this context, it's hard to answer your question. -scott
[sqlite] Best way to temporarily store data before processing
does this awesome sounding temp table then merge instantly when you commit it even if it?s huge? > On Apr 14, 2015, at 1:01 PM, Scott Hess wrote: > > On Tue, Apr 14, 2015 at 9:37 AM, Jim Callahan > wrote: >> My recollection is that SQLite has a "temp" or "tmp" namespace available >> for intermediate tables -- it was on my todo list, but I never got around >> to exploring that option. > > CREATE TEMP TABLE ... works very well for things like this. If you > use a distinct table name you can just refer to it directly, or you > can say temp.tablename (vs main.tablename). > > You can also create temp databases, where SQLite arranges for the > underlying file to go away on close (POSIX deletes after opening, > Windows uses DELETE-ON-CLOSE flags). I believe you pass NULL at the > filename to sqlite3_open(). I haven't tried out whether you can also > do "ATTACH NULL AS mydb", but I'd bet you can. > > The main advantage of temporary tables over a :memory: database is > that you don't have to worry about blowing out memory, because it can > spill to disk easily. There's no journal (because temporary), so it > should be faster than a staging table in your database. SQLite also > optimizes things if your temp tables can be handled entirely in the > cache, they'll never hit disk (and even if they hit an OS write, I > don't think they're ever sync'ed so the OS might never write things to > disk). > > The main dis-advantage is that you can't checkpoint things. If you > crash or shutdown, when you start back up the data is gone and you > have to start over. > > -scott
[sqlite] Best way to temporarily store data before processing
Yes!!! From: sqlite-users-bounces at mailinglists.sqlite.org [sqlite-users-bounces at mailinglists.sqlite.org] on behalf of Jonathan Moules [j.mou...@hrwallingford.com] Sent: 14 April 2015 03:42 PM To: 'Joseph T.'; 'General Discussion of SQLite Database' Subject: Re: [sqlite] Best way to temporarily store data before processing Thanks for the thoughts, useful to know. In relation to your question Joseph, historically the scripts were written to only put the raw data into the database and then the Views read it, but that was too slow. So now I've got an extra step to turn the raw data into easily Viewable data, with the bonus it uses considerably less disk space. The alternative of hard-coding the analysis and getting Python to write only the Processed Tables seemed like an ugly design, so I went with this. Cheers -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Joseph T. Sent: Tuesday, April 14, 2015 12:17 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Best way to temporarily store data before processing Mr. Moules, why not skip the raw tables entirely? Or failing that a separate in memory db would probably be the best and quickest option. Sent from my Samsung Epic? 4G TouchJonathan Moules wrote:Hi List, I'm wondering if anyone can offer me a "best practice" way of doing this. I'm doing some log analysis using Python/SQLite. Python parses a log file and splits the raw data from each line in the log into one of about 40 tables in an SQLite database (I'll call them Raw Tables). Once a log file has been processed like this, I run some SQL which takes the data from the Raw Tables and aggregates/processes it into about 10 different "Prepared Tables" which are read with a bunch of Views. The aggregation/processing doesn't take long, and the SQL for it is simple. I'd like to update the Prepared Tables after each log file is read because there are thousands of files and I don't want to have to rely on having GB of disk space sitting around for temporary Raw Tables. Once the Prepared Tables have been created, there's no real need to keep the data in the Raw Tables. The Prepared Tables don't have to be in the same database as the Raw Tables. I'm happy to use ATTACH. So my question: What's the best way to do this with the minimum overhead? Options that have come to mind (probably missed a lot): - Some sort of empty template database for the Raw Tables which is copied/cloned/overwritten for each file processed. - And/Or use "DELETE FROM Raw_Tables" to truncate it after each file (there are no indexes). - And/Or place it into :memory:. - And/Or just CREATE the Raw Tables for each file? - And/Or do it within the Prepared Tables database and use "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of course). Thoughts welcome, thanks for your time, Jonathan HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099 ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This message has been scanned for viruses by MailControl - www.mailcontrol.com Click https://www.mailcontrol.com/sr/OtIwePPILzPGX2PQPOmvUtzroFmuoUEuyo!5uyiNNRAv6vRKBGbHjbCMljIjlDona9wHUfmUgVH3KeH4!zzvzA== to report this email as spam. HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England
[sqlite] Best way to temporarily store data before processing
Thanks for the thoughts, useful to know. In relation to your question Joseph, historically the scripts were written to only put the raw data into the database and then the Views read it, but that was too slow. So now I've got an extra step to turn the raw data into easily Viewable data, with the bonus it uses considerably less disk space. The alternative of hard-coding the analysis and getting Python to write only the Processed Tables seemed like an ugly design, so I went with this. Cheers -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Joseph T. Sent: Tuesday, April 14, 2015 12:17 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Best way to temporarily store data before processing Mr. Moules, why not skip the raw tables entirely? Or failing that a separate in memory db would probably be the best and quickest option. Sent from my Samsung Epic? 4G TouchJonathan Moules wrote:Hi List, I'm wondering if anyone can offer me a "best practice" way of doing this. I'm doing some log analysis using Python/SQLite. Python parses a log file and splits the raw data from each line in the log into one of about 40 tables in an SQLite database (I'll call them Raw Tables). Once a log file has been processed like this, I run some SQL which takes the data from the Raw Tables and aggregates/processes it into about 10 different "Prepared Tables" which are read with a bunch of Views. The aggregation/processing doesn't take long, and the SQL for it is simple. I'd like to update the Prepared Tables after each log file is read because there are thousands of files and I don't want to have to rely on having GB of disk space sitting around for temporary Raw Tables. Once the Prepared Tables have been created, there's no real need to keep the data in the Raw Tables. The Prepared Tables don't have to be in the same database as the Raw Tables. I'm happy to use ATTACH. So my question: What's the best way to do this with the minimum overhead? Options that have come to mind (probably missed a lot): - Some sort of empty template database for the Raw Tables which is copied/cloned/overwritten for each file processed. - And/Or use "DELETE FROM Raw_Tables" to truncate it after each file (there are no indexes). - And/Or place it into :memory:. - And/Or just CREATE the Raw Tables for each file? - And/Or do it within the Prepared Tables database and use "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of course). Thoughts welcome, thanks for your time, Jonathan HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099 ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users This message has been scanned for viruses by MailControl - www.mailcontrol.com Click https://www.mailcontrol.com/sr/OtIwePPILzPGX2PQPOmvUtzroFmuoUEuyo!5uyiNNRAv6vRKBGbHjbCMljIjlDona9wHUfmUgVH3KeH4!zzvzA== to report this email as spam. HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099
[sqlite] Best way to temporarily store data before processing
On Tue, Apr 14, 2015 at 9:37 AM, Jim Callahan wrote: > My recollection is that SQLite has a "temp" or "tmp" namespace available > for intermediate tables -- it was on my todo list, but I never got around > to exploring that option. CREATE TEMP TABLE ... works very well for things like this. If you use a distinct table name you can just refer to it directly, or you can say temp.tablename (vs main.tablename). You can also create temp databases, where SQLite arranges for the underlying file to go away on close (POSIX deletes after opening, Windows uses DELETE-ON-CLOSE flags). I believe you pass NULL at the filename to sqlite3_open(). I haven't tried out whether you can also do "ATTACH NULL AS mydb", but I'd bet you can. The main advantage of temporary tables over a :memory: database is that you don't have to worry about blowing out memory, because it can spill to disk easily. There's no journal (because temporary), so it should be faster than a staging table in your database. SQLite also optimizes things if your temp tables can be handled entirely in the cache, they'll never hit disk (and even if they hit an OS write, I don't think they're ever sync'ed so the OS might never write things to disk). The main dis-advantage is that you can't checkpoint things. If you crash or shutdown, when you start back up the data is gone and you have to start over. -scott
[sqlite] Best way to temporarily store data before processing
?There are too many unknowns to give a definitive answer to your question. Any solution you chose is going to create some stress somewhere (in your "Prepared Tables" database, RAM, the file system and so forth. Just from my own limited experience. It is very easy while hacking to CREATE TABLE and DELETE FROM in your main database; the price one pays is that running .VACUUM to clean up the mess becomes time consuming. RAM would be fastest, but also the riskiest; any memory leak at all in your programs handling "gigabytes" of data and you run the risk of having to reboot the system. RAM files would also be at risk from power interruptions. If you were sure your data would "only" be 100 megabytes then you would have a substantial safety margin in RAM. Using RAM from the outset might fall in the category of "premature optimization". Usually one wants something that is robust and has the option of leaving an audit trail -- if that solution is too slow then look at putting parts of it in RAM. One typically needs an option of audit trails because there is always screwy data out there -- that sooner or later shows up and break one's system. A separate database using ATTACH makes sense, but the question would be whether to have a template database or to create the files. This is somewhat a matter of personal preference. I found it was not that hard to .DUMP the template databases and create an "all code" solution. From my perspective, I found an "all code" solution to be easier to maintain because I did not have to rely on what may or may not be in (a version of) a template database (and if necessary, I could change the template databases in the same project code I was working on), but your circumstances and preferences may be different. This also depends on the expressiveness of your scripting language. I was using Windows batch scripts which did not loop very well; so I had to use some ingenuity to do repetitive batch operations. Python is a more expressive language. My recollection is that SQLite has a "temp" or "tmp" namespace available for intermediate tables -- it was on my todo list, but I never got around to exploring that option. Jim Callahan Orlando, FL On Tue, Apr 14, 2015 at 5:40 AM, Jonathan Moules wrote: > Hi List, > I'm wondering if anyone can offer me a "best practice" way of doing this. > > I'm doing some log analysis using Python/SQLite. Python parses a log file > and splits the raw data from each line in the log into one of about 40 > tables in an SQLite database (I'll call them Raw Tables). > > Once a log file has been processed like this, I run some SQL which takes > the data from the Raw Tables and aggregates/processes it into about 10 > different "Prepared Tables" which are read with a bunch of Views. The > aggregation/processing doesn't take long, and the SQL for it is simple. > > I'd like to update the Prepared Tables after each log file is read because > there are thousands of files and I don't want to have to rely on having GB > of disk space sitting around for temporary Raw Tables. > > Once the Prepared Tables have been created, there's no real need to keep > the data in the Raw Tables. > > The Prepared Tables don't have to be in the same database as the Raw > Tables. I'm happy to use ATTACH. > > So my question: > What's the best way to do this with the minimum overhead? > > Options that have come to mind (probably missed a lot): > - Some sort of empty template database for the Raw Tables > which is copied/cloned/overwritten for each file processed. > - And/Or use "DELETE FROM Raw_Tables" to truncate it after > each file (there are no indexes). > - And/Or place it into :memory:. > - And/Or just CREATE the Raw Tables for each file? > - And/Or do it within the Prepared Tables database and use > "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of > course). > > > Thoughts welcome, thanks for your time, > Jonathan > > > > HR Wallingford and its subsidiaries uses faxes and emails for confidential > and legally privileged business communications. They do not of themselves > create legal commitments. Disclosure to parties other than addressees > requires our specific consent. We are not liable for unauthorised > disclosures nor reliance upon them. > If you have received this message in error please advise us immediately > and destroy all copies of it. > > HR Wallingford Limited > Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom > Registered in England No. 02562099 > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Best way to temporarily store data before processing
For native SQLite tables, DROP TABLE is much faster than DELETE FROM. -Urspr?ngliche Nachricht- Von: Jonathan Moules [mailto:J.Moules at hrwallingford.com] Gesendet: Dienstag, 14. April 2015 11:40 An: 'sqlite-users at mailinglists.sqlite.org' Betreff: [sqlite] Best way to temporarily store data before processing Hi List, I'm wondering if anyone can offer me a "best practice" way of doing this. I'm doing some log analysis using Python/SQLite. Python parses a log file and splits the raw data from each line in the log into one of about 40 tables in an SQLite database (I'll call them Raw Tables). Once a log file has been processed like this, I run some SQL which takes the data from the Raw Tables and aggregates/processes it into about 10 different "Prepared Tables" which are read with a bunch of Views. The aggregation/processing doesn't take long, and the SQL for it is simple. I'd like to update the Prepared Tables after each log file is read because there are thousands of files and I don't want to have to rely on having GB of disk space sitting around for temporary Raw Tables. Once the Prepared Tables have been created, there's no real need to keep the data in the Raw Tables. The Prepared Tables don't have to be in the same database as the Raw Tables. I'm happy to use ATTACH. So my question: What's the best way to do this with the minimum overhead? Options that have come to mind (probably missed a lot): - Some sort of empty template database for the Raw Tables which is copied/cloned/overwritten for each file processed. - And/Or use "DELETE FROM Raw_Tables" to truncate it after each file (there are no indexes). - And/Or place it into :memory:. - And/Or just CREATE the Raw Tables for each file? - And/Or do it within the Prepared Tables database and use "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of course). Thoughts welcome, thanks for your time, Jonathan HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099 ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: hick at scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
[sqlite] Best way to temporarily store data before processing
Hi List, I'm wondering if anyone can offer me a "best practice" way of doing this. I'm doing some log analysis using Python/SQLite. Python parses a log file and splits the raw data from each line in the log into one of about 40 tables in an SQLite database (I'll call them Raw Tables). Once a log file has been processed like this, I run some SQL which takes the data from the Raw Tables and aggregates/processes it into about 10 different "Prepared Tables" which are read with a bunch of Views. The aggregation/processing doesn't take long, and the SQL for it is simple. I'd like to update the Prepared Tables after each log file is read because there are thousands of files and I don't want to have to rely on having GB of disk space sitting around for temporary Raw Tables. Once the Prepared Tables have been created, there's no real need to keep the data in the Raw Tables. The Prepared Tables don't have to be in the same database as the Raw Tables. I'm happy to use ATTACH. So my question: What's the best way to do this with the minimum overhead? Options that have come to mind (probably missed a lot): - Some sort of empty template database for the Raw Tables which is copied/cloned/overwritten for each file processed. - And/Or use "DELETE FROM Raw_Tables" to truncate it after each file (there are no indexes). - And/Or place it into :memory:. - And/Or just CREATE the Raw Tables for each file? - And/Or do it within the Prepared Tables database and use "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of course). Thoughts welcome, thanks for your time, Jonathan HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099
[sqlite] Best way to temporarily store data before processing
Mr. Moules, why not skip the raw tables entirely? Or failing that a separate in memory db would probably be the best and quickest option. Sent from my Samsung Epic? 4G TouchJonathan Moules wrote:Hi List, I'm wondering if anyone can offer me a "best practice" way of doing this. I'm doing some log analysis using Python/SQLite. Python parses a log file and splits the raw data from each line in the log into one of about 40 tables in an SQLite database (I'll call them Raw Tables). Once a log file has been processed like this, I run some SQL which takes the data from the Raw Tables and aggregates/processes it into about 10 different "Prepared Tables" which are read with a bunch of Views. The aggregation/processing doesn't take long, and the SQL for it is simple. I'd like to update the Prepared Tables after each log file is read because there are thousands of files and I don't want to have to rely on having GB of disk space sitting around for temporary Raw Tables. Once the Prepared Tables have been created, there's no real need to keep the data in the Raw Tables. The Prepared Tables don't have to be in the same database as the Raw Tables. I'm happy to use ATTACH. So my question: What's the best way to do this with the minimum overhead? Options that have come to mind (probably missed a lot): ??? - Some sort of empty template database for the Raw Tables which is copied/cloned/overwritten for each file processed. ??? - And/Or use "DELETE FROM Raw_Tables" to truncate it after each file (there are no indexes). ??? - And/Or place it into :memory:. ??? - And/Or just CREATE the Raw Tables for each file? ??? - And/Or do it within the Prepared Tables database and use "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of course). Thoughts welcome, thanks for your time, Jonathan HR Wallingford and its subsidiaries uses faxes and emails for confidential and legally privileged business communications. They do not of themselves create legal commitments. Disclosure to parties other than addressees requires our specific consent. We are not liable for unauthorised disclosures nor reliance upon them. If you have received this message in error please advise us immediately and destroy all copies of it. HR Wallingford Limited Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom Registered in England No. 02562099 ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users