Hash: SHA1

Jay A. Kreibich wrote:
>   SQLite is not "lying."  After poking around a bit to refresh my
>   understanding of SQLite's file structure, I think it is safe to say
>   that SQLite will almost never do a sequential file read, even if
>   you're doing a sequential table scan.
>     sequential table scan != sequential file access

That statement is true in general but in this specific case we are
talking about a multi-gigabyte database.  It would be fair to say that
large chunks of the database would have been created in a transaction
and that there wouldn't have been a lot of row updates/deletes/inserts
throughout it causing something similar to fragmentation.

I made the vfs track sequential vs random read accesses and then created
a 3 column table with each row being a random int, a string between 0
and 1024 chars long and another random int.  A 2,000,000 row database
occupied 1.4GB.  Page/cache size etc were all left at their defaults.
Testing was done on 64 bit Linux.

   create table foo(x,y,z)

Full scan query:  select max(z) from foo

  seq 1,298,231 random 93,599  - 93% sequential

Create index:      create index foo1 on foo(z)

  seq 244,920 random 3,261,252  - 7% sequential

Indexed query:     select max(z) from foo where z>99999900

  seq 0 random 7                - 0% sequential

In this particular case SQLite is very much lying when a full table scan
was done as the vast majority of access is indeed sequential.  Of course
several factors will affect this such as typical size of each row,
schema, how much prior "thrashing" about of contents the database has
had etc.  I also didn't count nearly sequential access, for example if
the operating system does reads in 4kb chunks and one read was for the
first kb and the next read was for the third kb.

An example of how the numbers can be skewed for the full scan:

%Sequential   StringSize     PageSize

  8             1-2KB          1KB
 99             0-1KB          4KB
 99.9           0-1KB         32KB
 97             0-.5KB         1KB

It is fair to say that SQLite does not know in advance what the access
pattern is going to be, especially without analysing the database
contents, rows per page and queries that will be executed.  Queries that
predominantly use indices (a desirable trait) are mostly random access.
 Full table scans where the row fits within the page are mostly sequential.

Version: GnuPG v1.4.6 (GNU/Linux)

sqlite-users mailing list

Reply via email to