Re: [sqlite] VACUUM and large indices: best practice?

2013-09-02 Thread Dennis Jenkins
On Fri, Aug 30, 2013 at 3:52 PM, Simon Slavin  wrote:

> Great question.  If all you are doing is writing (never any UPDATE or
> DELETE FROM or DROP) then VACUUM won't save any space in your database
> file.  It would defragment your database and might thereby increase speed a
> little, but this is mostly for old-fashioned systems that get most of their
> speed from 'read-ahead buffering'.  I would not expect much of a time
> saving (less than 1%) from any modern setup (test it !).  If it's
> difficult, annoying or time-consuming to do the VACUUM I wouldn't bother.
>

This has not been my experience.

I have a legacy system that exports data by writing it to a sqlite
database.  This file is only ever written to once (not updated or deleted
while being written, either), vacuumed, zipped and sent to a receiving
system.  The file size varies greatly, based on the source data. The
database schema has ~20 tables in it, some with many rows, some with very
few.  There are a few indices (created with the schema, before the data is
populated).

Vacuuming the database file shrinks it between 10% and 20%.  This surprised
me; I was expecting near zero savings. However, I did not research why, I
just accepted it and moved on.

I suggest to the OP to perform some experiments.  That is what I did.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUM and large indices: best practice?

2013-09-02 Thread Markus Schaber
Hi,

Im Auftrag von uncle.f

> Hello all,
> 
> I have a process that runs 24/7 and permanently inserts data into an SQLite
> database:
> 
> 1) Create new database at midnight, keep it open for writing for the next
> 24 hours
> 2) There are no deletions and not even reads during the database creation
> 3) I use only bulk inserts wrapped in a transaction (for each minute of
> data) that may contain a few thousands of rows in every transaction.
> 4) My journal mode is MEMORY.
> 5) Once the insertion process is done with I build several indices and close
> the database
> 6) After that the database file is moved over the network to a storage device
> 
> The database will only be used again for reading and will remain unmodified
> forever.
> 
> Each database is fairly large (3-5 GB) and considering it will never be
> modified again I would like to take all possible measures to ensure that the
> file size / fragmentation / data access times are all as low as possible.
> 
> So my question is about how to ensure most efficient data allocation for such
> scenario. I thought of several options:
> 
> 1) VACUUM before creating indices, then create indices, then move database
> off to storage
> 2) Create indices, then VACUUM, then move off to storage
> 3) Create indices, move to storage, VACUUM when already on storage (using
> SQLite process running locally on storage device)
> 
> ... or any other sequence of those 3 steps (vacuum, indexing, moving to
> storage)

Did you consider running ANALYZE, too? That will populate the statistics which 
the query planner can use to gain optimal index usage.



Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUM and large indices: best practice?

2013-08-30 Thread Simon Slavin

On 30 Aug 2013, at 9:32pm, uncle.f  wrote:

> I thought of several options:
> 
> 1) VACUUM before creating indices, then create indices, then move database
> off to storage
> 2) Create indices, then VACUUM, then move off to storage
> 3) Create indices, move to storage, VACUUM when already on storage (using
> SQLite process running locally on storage device)
> 
> ... or any other sequence of those 3 steps (vacuum, indexing, moving to
> storage)

Thank you very much for your detailed explanation of your procedure which saves 
lots of questions.

> Another question would be, do I even need to VACUUM considering the way my
> database is being produced?

Great question.  If all you are doing is writing (never any UPDATE or DELETE 
FROM or DROP) then VACUUM won't save any space in your database file.  It would 
defragment your database and might thereby increase speed a little, but this is 
mostly for old-fashioned systems that get most of their speed from 'read-ahead 
buffering'.  I would not expect much of a time saving (less than 1%) from any 
modern setup (test it !).  If it's difficult, annoying or time-consuming to do 
the VACUUM I wouldn't bother.

If you do decide to VACUUM then of your three options listed above I would do 
(2).  Theoretically the indices would benefit from being vacuumed, and 
theoretically moving the VACUUMed file to the new storage medium should ensure 
it's defragmented.

> I would appreciate a reply from somebody  who is aware of SQLite internals
> rather than an "educated guess" :-)

Then you asked in the wrong place.  You take what you can get by posting here.  
The idea is that if someone posts a wrong answer it'll be seen and corrected by 
another reader.  It's open source information.

If you want only replies from experts I assume you'll be happy to pay Hwaci's 
consultancy fee.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] VACUUM and large indices: best practice?

2013-08-30 Thread uncle.f
Hello all,

I have a process that runs 24/7 and permanently inserts data into an SQLite
database:

1) Create new database at midnight, keep it open for writing for the next
24 hours
2) There are no deletions and not even reads during the database creation
3) I use only bulk inserts wrapped in a transaction (for each minute of
data) that may contain a few thousands of rows in every transaction.
4) My journal mode is MEMORY.
5) Once the insertion process is done with I build several indices and
close the database
6) After that the database file is moved over the network to a storage
device

The database will only be used again for reading and will remain unmodified
forever.

Each database is fairly large (3-5 GB) and considering it will never be
modified again I would like to take all possible measures to ensure that
the file size / fragmentation / data access times are all as low as
possible.

So my question is about how to ensure most efficient data allocation for
such scenario. I thought of several options:

1) VACUUM before creating indices, then create indices, then move database
off to storage
2) Create indices, then VACUUM, then move off to storage
3) Create indices, move to storage, VACUUM when already on storage (using
SQLite process running locally on storage device)

... or any other sequence of those 3 steps (vacuum, indexing, moving to
storage)

Another question would be, do I even need to VACUUM considering the way my
database is being produced?

I would appreciate a reply from somebody  who is aware of SQLite internals
rather than an "educated guess" :-)

Thank you for you time,

Andrei
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users