Re: [sqlite] VACUUM and large indices: best practice?
On Fri, Aug 30, 2013 at 3:52 PM, Simon Slavinwrote: > Great question. If all you are doing is writing (never any UPDATE or > DELETE FROM or DROP) then VACUUM won't save any space in your database > file. It would defragment your database and might thereby increase speed a > little, but this is mostly for old-fashioned systems that get most of their > speed from 'read-ahead buffering'. I would not expect much of a time > saving (less than 1%) from any modern setup (test it !). If it's > difficult, annoying or time-consuming to do the VACUUM I wouldn't bother. > This has not been my experience. I have a legacy system that exports data by writing it to a sqlite database. This file is only ever written to once (not updated or deleted while being written, either), vacuumed, zipped and sent to a receiving system. The file size varies greatly, based on the source data. The database schema has ~20 tables in it, some with many rows, some with very few. There are a few indices (created with the schema, before the data is populated). Vacuuming the database file shrinks it between 10% and 20%. This surprised me; I was expecting near zero savings. However, I did not research why, I just accepted it and moved on. I suggest to the OP to perform some experiments. That is what I did. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] VACUUM and large indices: best practice?
Hi, Im Auftrag von uncle.f > Hello all, > > I have a process that runs 24/7 and permanently inserts data into an SQLite > database: > > 1) Create new database at midnight, keep it open for writing for the next > 24 hours > 2) There are no deletions and not even reads during the database creation > 3) I use only bulk inserts wrapped in a transaction (for each minute of > data) that may contain a few thousands of rows in every transaction. > 4) My journal mode is MEMORY. > 5) Once the insertion process is done with I build several indices and close > the database > 6) After that the database file is moved over the network to a storage device > > The database will only be used again for reading and will remain unmodified > forever. > > Each database is fairly large (3-5 GB) and considering it will never be > modified again I would like to take all possible measures to ensure that the > file size / fragmentation / data access times are all as low as possible. > > So my question is about how to ensure most efficient data allocation for such > scenario. I thought of several options: > > 1) VACUUM before creating indices, then create indices, then move database > off to storage > 2) Create indices, then VACUUM, then move off to storage > 3) Create indices, move to storage, VACUUM when already on storage (using > SQLite process running locally on storage device) > > ... or any other sequence of those 3 steps (vacuum, indexing, moving to > storage) Did you consider running ANALYZE, too? That will populate the statistics which the query planner can use to gain optimal index usage. Best regards Markus Schaber CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH Inspiring Automation Solutions 3S-Smart Software Solutions GmbH Dipl.-Inf. Markus Schaber | Product Development Core Technology Memminger Str. 151 | 87439 Kempten | Germany Tel. +49-831-54031-979 | Fax +49-831-54031-50 E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com CODESYS forum: http://forum.codesys.com Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] VACUUM and large indices: best practice?
On 30 Aug 2013, at 9:32pm, uncle.fwrote: > I thought of several options: > > 1) VACUUM before creating indices, then create indices, then move database > off to storage > 2) Create indices, then VACUUM, then move off to storage > 3) Create indices, move to storage, VACUUM when already on storage (using > SQLite process running locally on storage device) > > ... or any other sequence of those 3 steps (vacuum, indexing, moving to > storage) Thank you very much for your detailed explanation of your procedure which saves lots of questions. > Another question would be, do I even need to VACUUM considering the way my > database is being produced? Great question. If all you are doing is writing (never any UPDATE or DELETE FROM or DROP) then VACUUM won't save any space in your database file. It would defragment your database and might thereby increase speed a little, but this is mostly for old-fashioned systems that get most of their speed from 'read-ahead buffering'. I would not expect much of a time saving (less than 1%) from any modern setup (test it !). If it's difficult, annoying or time-consuming to do the VACUUM I wouldn't bother. If you do decide to VACUUM then of your three options listed above I would do (2). Theoretically the indices would benefit from being vacuumed, and theoretically moving the VACUUMed file to the new storage medium should ensure it's defragmented. > I would appreciate a reply from somebody who is aware of SQLite internals > rather than an "educated guess" :-) Then you asked in the wrong place. You take what you can get by posting here. The idea is that if someone posts a wrong answer it'll be seen and corrected by another reader. It's open source information. If you want only replies from experts I assume you'll be happy to pay Hwaci's consultancy fee. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] VACUUM and large indices: best practice?
Hello all, I have a process that runs 24/7 and permanently inserts data into an SQLite database: 1) Create new database at midnight, keep it open for writing for the next 24 hours 2) There are no deletions and not even reads during the database creation 3) I use only bulk inserts wrapped in a transaction (for each minute of data) that may contain a few thousands of rows in every transaction. 4) My journal mode is MEMORY. 5) Once the insertion process is done with I build several indices and close the database 6) After that the database file is moved over the network to a storage device The database will only be used again for reading and will remain unmodified forever. Each database is fairly large (3-5 GB) and considering it will never be modified again I would like to take all possible measures to ensure that the file size / fragmentation / data access times are all as low as possible. So my question is about how to ensure most efficient data allocation for such scenario. I thought of several options: 1) VACUUM before creating indices, then create indices, then move database off to storage 2) Create indices, then VACUUM, then move off to storage 3) Create indices, move to storage, VACUUM when already on storage (using SQLite process running locally on storage device) ... or any other sequence of those 3 steps (vacuum, indexing, moving to storage) Another question would be, do I even need to VACUUM considering the way my database is being produced? I would appreciate a reply from somebody who is aware of SQLite internals rather than an "educated guess" :-) Thank you for you time, Andrei ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users