Re: [sqlite] create index before or after many inserts?
On 16 Jun 2010, at 5:33am, Jay A. Kreibich wrote: > On Tue, Jun 15, 2010 at 08:58:22PM -0700, Roger Binns scratched on the wall: >> On 06/15/2010 07:59 PM, Simon Slavin wrote: >>> The standard assumption about SQLite is that it's faster to do your INSERTs >>> first, then create the indices. >> >> If the index is created first then the data in the table and the pages >> making up the index will be interleaved. That is likely to lead to more and >> further seeks on accessing the index. Creating the index afterwards will >> result in a contiguous sequence of pages (assuming no existing free pages). > > Contiguous, yes, but the pages may not be in any logical order. > The internal node pages will get shuffled as the tree is built, > meaning you might still have a significant number of seeks. Actually, I always figured that the CREATE INDEX routine was better tuned for creating many entries in an index at once, than writing them one by one. Each index has its own set of pages, right ? Imagine a table with three indices. Inserting a thousand rows would normally do something like mess with index 1; mess with index 2; mess with index 3; mess with index 1; mess with index 2 ... 1000 times There would be far less shuffling involved if all the index 1 work was done in one chunk, then all the index 2 work, etc.. Of course it's possible that SQLite is clever about transactions and if all the INSERT commands are in one transaction it does all the index-updating together. That would be neat. I haven't read the source code. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
On 16 Jun 2010, at 5:04am, Eric Smith wrote: > Simon Slavin wrote: > >> The standard assumption about SQLite is that it's faster to do your >> INSERTs first, then create the indices. How much of a difference this >> makes depends on a lot of things. > > On what things does it depend? Nothing magic, just all the normal things: How many indices do you have ? How many columns in each ? How many rows in each ? Are the indexed fields 'clumpy' or not ? How fast are your CPU and storage system ? How much of three different types of cache do you have ? Is your database storage system optimized for read-ahead or write-ahead storage ? The only way to find out is to try it. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/15/2010 09:33 PM, Jay A. Kreibich wrote: > Contiguous, yes, but the pages may not be in any logical order. > The internal node pages will get shuffled as the tree is built, > meaning you might still have a significant number of seeks. Seeks nearer to other pages will be quicker (for spinning media) and most operating systems do a fair amount of read ahead. For example if SQLite asks for a particular page the OS may read the next 256kb too. If that additional data was only index pages then chances are far more likely for cache hits. About the only scenario where having the index pages and data pages interleaved is beneficial is a query that needs some columns from the index and some (unindexed) from the data and most of the rows are relevant to the query. The data pages and index pages will be closer then, but even this scenario would require very carefully contriving your data and indices. The advice still stands - create the indices after, not before. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwYaFUACgkQmOOfHg372QQ3lQCfdHcDOrgftEepPICEhzXuSTqk qa0AoJBiPbcitaqwZYQulr/1bAvls5+B =qs6R -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
On Tue, Jun 15, 2010 at 08:58:22PM -0700, Roger Binns scratched on the wall: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 06/15/2010 07:59 PM, Simon Slavin wrote: > > The standard assumption about SQLite is that it's faster to do your INSERTs > > first, then create the indices. > > If the index is created first then the data in the table and the pages > making up the index will be interleaved. That is likely to lead to more and > further seeks on accessing the index. Creating the index afterwards will > result in a contiguous sequence of pages (assuming no existing free pages). Contiguous, yes, but the pages may not be in any logical order. The internal node pages will get shuffled as the tree is built, meaning you might still have a significant number of seeks. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
Simon Slavin wrote: > The standard assumption about SQLite is that it's faster to do your > INSERTs first, then create the indices. How much of a difference this > makes depends on a lot of things. On what things does it depend? -- Eric A. Smith Sendmail may be safely run set-user-id to root. -- Eric Allman, "Sendmail Installation Guide" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/15/2010 07:59 PM, Simon Slavin wrote: > The standard assumption about SQLite is that it's faster to do your INSERTs > first, then create the indices. If the index is created first then the data in the table and the pages making up the index will be interleaved. That is likely to lead to more and further seeks on accessing the index. Creating the index afterwards will result in a contiguous sequence of pages (assuming no existing free pages). Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwYS94ACgkQmOOfHg372QQqngCeJ2itOBTZmY2gGVzk4CKCQWPc ZMwAn0fZ1fO+ID4K8Ak65RZBrtV03Te3 =a+mh -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] create index before or after many inserts?
On 16 Jun 2010, at 3:55am, Eric Smith wrote: > Let's say my app has (only) inserts followed by (only) reads. > > The reads are best served by some indices. So I can create the indices > before the INSERTs, or after them. > > In general, should I expect a run time perf difference between these two > options? The standard assumption about SQLite is that it's faster to do your INSERTs first, then create the indices. How much of a difference this makes depends on a lot of things. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] create index before or after many inserts?
Let's say my app has (only) inserts followed by (only) reads. The reads are best served by some indices. So I can create the indices before the INSERTs, or after them. In general, should I expect a run time perf difference between these two options? Eric -- Eric A. Smith Louis Pasteur's theory of germs is ridiculous fiction. -- Pierre Pachet, Professor of Physiology at Toulouse, 1872 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users