-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jay A. Kreibich wrote: > SQLite is not "lying." After poking around a bit to refresh my > understanding of SQLite's file structure, I think it is safe to say > that SQLite will almost never do a sequential file read, even if > you're doing a sequential table scan. > > sequential table scan != sequential file access
That statement is true in general but in this specific case we are talking about a multi-gigabyte database. It would be fair to say that large chunks of the database would have been created in a transaction and that there wouldn't have been a lot of row updates/deletes/inserts throughout it causing something similar to fragmentation. I made the vfs track sequential vs random read accesses and then created a 3 column table with each row being a random int, a string between 0 and 1024 chars long and another random int. A 2,000,000 row database occupied 1.4GB. Page/cache size etc were all left at their defaults. Testing was done on 64 bit Linux. create table foo(x,y,z) Full scan query: select max(z) from foo seq 1,298,231 random 93,599 - 93% sequential Create index: create index foo1 on foo(z) seq 244,920 random 3,261,252 - 7% sequential Indexed query: select max(z) from foo where z>99999900 seq 0 random 7 - 0% sequential In this particular case SQLite is very much lying when a full table scan was done as the vast majority of access is indeed sequential. Of course several factors will affect this such as typical size of each row, schema, how much prior "thrashing" about of contents the database has had etc. I also didn't count nearly sequential access, for example if the operating system does reads in 4kb chunks and one read was for the first kb and the next read was for the third kb. An example of how the numbers can be skewed for the full scan: %Sequential StringSize PageSize 8 1-2KB 1KB 99 0-1KB 4KB 99.9 0-1KB 32KB 97 0-.5KB 1KB It is fair to say that SQLite does not know in advance what the access pattern is going to be, especially without analysing the database contents, rows per page and queries that will be executed. Queries that predominantly use indices (a desirable trait) are mostly random access. Full table scans where the row fits within the page are mostly sequential. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFI0xBomOOfHg372QQRAnXXAJ0VGwzzrOwkWF0q4Za2QpuTKnQD+QCfblin DeHukDoCs809uawnlt5GYLc= =l25n -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users