Re: [sqlite] create index before or after many inserts?

2010-06-16 Thread Simon Slavin

On 16 Jun 2010, at 5:33am, Jay A. Kreibich wrote:

> On Tue, Jun 15, 2010 at 08:58:22PM -0700, Roger Binns scratched on the wall:
>> On 06/15/2010 07:59 PM, Simon Slavin wrote:
>>> The standard assumption about SQLite is that it's faster to do your INSERTs 
>>> first, then create the indices.
>> 
>> If the index is created first then the data in the table and the pages
>> making up the index will be interleaved.  That is likely to lead to more and
>> further seeks on accessing the index.  Creating the index afterwards will
>> result in a contiguous sequence of pages (assuming no existing free pages).
> 
>  Contiguous, yes, but the pages may not be in any logical order.
>  The internal node pages will get shuffled as the tree is built,
>  meaning you might still have a significant number of seeks.

Actually, I always figured that the CREATE INDEX routine was better tuned for 
creating many entries in an index at once, than writing them one by one.  Each 
index has its own set of pages, right ?  Imagine a table with three indices.  
Inserting a thousand rows would normally do something like

mess with index 1; mess with index 2; mess with index 3;
mess with index 1; mess with index 2 ...
1000 times

There would be far less shuffling involved if all the index 1 work was done in 
one chunk, then all the index 2 work, etc..

Of course it's possible that SQLite is clever about transactions and if all the 
INSERT commands are in one transaction it does all the index-updating together. 
 That would be neat.  I haven't read the source code.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-16 Thread Simon Slavin

On 16 Jun 2010, at 5:04am, Eric Smith wrote:

> Simon Slavin wrote: 
> 
>> The standard assumption about SQLite is that it's faster to do your 
>> INSERTs first, then create the indices.  How much of a difference this 
>> makes depends on a lot of things.  
> 
> On what things does it depend?

Nothing magic, just all the normal things:

How many indices do you have ?  How many columns in each ?  How many rows in 
each ?  Are the indexed fields 'clumpy' or not ?  How fast are your CPU and 
storage system ?  How much of three different types of cache do you have ?  Is 
your database storage system optimized for read-ahead or write-ahead storage ?

The only way to find out is to try it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-16 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/15/2010 09:33 PM, Jay A. Kreibich wrote:
>   Contiguous, yes, but the pages may not be in any logical order.
>   The internal node pages will get shuffled as the tree is built,
>   meaning you might still have a significant number of seeks.

Seeks nearer to other pages will be quicker (for spinning media) and
most operating systems do a fair amount of read ahead.  For example if
SQLite asks for a particular page the OS may read the next 256kb too.
If that additional data was only index pages then chances are far more
likely for cache hits.

About the only scenario where having the index pages and data pages
interleaved is beneficial is a query that needs some columns from the
index and some (unindexed) from the data and most of the rows are
relevant to the query.  The data pages and index pages will be closer
then, but even this scenario would require very carefully contriving
your data and indices.

The advice still stands - create the indices after, not before.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwYaFUACgkQmOOfHg372QQ3lQCfdHcDOrgftEepPICEhzXuSTqk
qa0AoJBiPbcitaqwZYQulr/1bAvls5+B
=qs6R
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-15 Thread Jay A. Kreibich
On Tue, Jun 15, 2010 at 08:58:22PM -0700, Roger Binns scratched on the wall:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 06/15/2010 07:59 PM, Simon Slavin wrote:
> > The standard assumption about SQLite is that it's faster to do your INSERTs 
> > first, then create the indices.
> 
> If the index is created first then the data in the table and the pages
> making up the index will be interleaved.  That is likely to lead to more and
> further seeks on accessing the index.  Creating the index afterwards will
> result in a contiguous sequence of pages (assuming no existing free pages).

  Contiguous, yes, but the pages may not be in any logical order.
  The internal node pages will get shuffled as the tree is built,
  meaning you might still have a significant number of seeks.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-15 Thread Eric Smith
Simon Slavin wrote: 

> The standard assumption about SQLite is that it's faster to do your 
> INSERTs first, then create the indices.  How much of a difference this 
> makes depends on a lot of things.  

On what things does it depend?

-- 
Eric A. Smith

Sendmail may be safely run set-user-id to root.
-- Eric Allman, "Sendmail Installation Guide"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-15 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/15/2010 07:59 PM, Simon Slavin wrote:
> The standard assumption about SQLite is that it's faster to do your INSERTs 
> first, then create the indices.

If the index is created first then the data in the table and the pages
making up the index will be interleaved.  That is likely to lead to more and
further seeks on accessing the index.  Creating the index afterwards will
result in a contiguous sequence of pages (assuming no existing free pages).

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwYS94ACgkQmOOfHg372QQqngCeJ2itOBTZmY2gGVzk4CKCQWPc
ZMwAn0fZ1fO+ID4K8Ak65RZBrtV03Te3
=a+mh
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-15 Thread Simon Slavin

On 16 Jun 2010, at 3:55am, Eric Smith wrote:

> Let's say my app has (only) inserts followed by (only) reads.  
> 
> The reads are best served by some indices.  So I can create the indices 
> before the INSERTs, or after them.  
> 
> In general, should I expect a run time perf difference between these two 
> options?  

The standard assumption about SQLite is that it's faster to do your INSERTs 
first, then create the indices.  How much of a difference this makes depends on 
a lot of things.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] create index before or after many inserts?

2010-06-15 Thread Eric Smith
Let's say my app has (only) inserts followed by (only) reads.  

The reads are best served by some indices.  So I can create the indices 
before the INSERTs, or after them.  

In general, should I expect a run time perf difference between these two 
options?  

Eric 

-- 
Eric A. Smith

Louis Pasteur's theory of germs is ridiculous fiction.
-- Pierre Pachet, Professor of Physiology at Toulouse, 1872
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users