Re: [sqlite] Reads and Writes

2005-06-15 Thread Sean Heber

After SQLite obtains a read lock it will have to perform at least the
following:
- check the file header to check for corruption
- check for a hot transaction log, and roll back changes if necessary
- read the database schema from the sqlite_master table

This is the most likely cause of the repetitive behavior you're  
seeing.


Ok, that makes sense.



That said: the OS will of course cache the database file as well, and
many of the 'read' calls will be very fast.


I think you're right about that.  After researching it a bit more,  
I've noticed that disk read usage is actually almost zero during all  
of this which certainly supports your claim.  It is the disk write  
activity that might be slowing things down for me.  I'll have to see  
if I can figure out anything to speed up inserts.  I've tried setting  
synchronize to off but after a time the inserts seem to build up and  
performance degrades to about the same point it is with synchronize  
set to full.


Thanks for the info.

l8r
Sean



Re: [sqlite] Reads and Writes

2005-06-14 Thread Dan Kennedy

> I can imagine a protocol that will improve performance by keeping track
> of whether the database was updated, which would allow a quick
> determination whether a program's cache was still valid. The idea is to
> increment a 'version number' in the database file as soon as a program
> obtains a write lock. A reader could then determine whether the file was
> written to and flush the cache only when necessary, speeding things up a
> little. A 128 bit counter should never wrap around.

Such a counter (32-bits only) was included in the file-format for 
version 3. Current versions of the library update the counter every commit, 
but it's not being used to allow persistent caches yet. I think such a change 
would require a lot of testing before it would be safe enough to use.

And as you say, sophisticated operating systems do a pretty good job of 
caching the file anyway.





__ 
Discover Yahoo! 
Find restaurants, movies, travel and more fun for the weekend. Check it out! 
http://discover.yahoo.com/weekend.html 



Re: [sqlite] Reads and Writes

2005-06-14 Thread Gé Weijers
Sean Heber wrote:

>
>
> My database file is only around 4MB and I have set the 
> default_cache_size to 5.  From what I've read, that should 
> translate to almost 50MB of cache size which would be more than 
> enough to keep the entire database in memory, I'd think.  Yet it 
> doesn't seem to actually do that since it is reading from the file so 
> often.

SQLite cannot know whether the database file has been modified by
another process once it releases its file locks, so it needs to read
everything again. If you wrap multiple queries in a transaction the file
stays locked and cache remains valid, and the number of seeks/reads
should go down.

After SQLite obtains a read lock it will have to perform at least the
following:
- check the file header to check for corruption
- check for a hot transaction log, and roll back changes if necessary
- read the database schema from the sqlite_master table

This is the most likely cause of the repetitive behavior you're seeing.

That said: the OS will of course cache the database file as well, and
many of the 'read' calls will be very fast.

I can imagine a protocol that will improve performance by keeping track
of whether the database was updated, which would allow a quick
determination whether a program's cache was still valid. The idea is to
increment a 'version number' in the database file as soon as a program
obtains a write lock. A reader could then determine whether the file was
written to and flush the cache only when necessary, speeding things up a
little. A 128 bit counter should never wrap around.


Gé

-- 
Ge' Weijers
e-mail: [EMAIL PROTECTED]
tel:  (520)623-8542



[sqlite] Reads and Writes

2005-06-14 Thread Sean Heber
My program has a lot of simple select queries.  Most of them are of  
the "select count() from... " variety.  These are very simple queries  
where there is a single WHERE clause and the columns referenced are  
either a primary key column or another indexed column.


I would expect the database to keep the indexes in memory and thus  
have virtually no need to go to disk when doing these count() queries  
and such.  Yet when I do an strace on the process, I see significant  
read() and seek() calls on the database file.  What are the  
conditions for when it needs to read from the database file?


My database file is only around 4MB and I have set the  
default_cache_size to 5.  From what I've read, that should  
translate to almost 50MB of cache size which would be more than  
enough to keep the entire database in memory, I'd think.  Yet it  
doesn't seem to actually do that since it is reading from the file so  
often.


I've seen this on OSX and Linux but this strace output is from Linux:
Snippet of strace output:

fcntl64(5, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET,  
start=1073741824, len=1}, 0xbfffdb00) = 0
fcntl64(5, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET,  
start=1073741826, len=510}, 0xbfffdb00) = 0
fcntl64(5, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET,  
start=1073741824, len=1}, 0xbfffdb00) = 0

access("/db/data-journal", F_OK) = -1 ENOENT (No such file or directory)
fstat64(5, {st_mode=S_IFREG|0644, st_size=4402176, ...}) = 0
_llseek(5, 0, [0], SEEK_SET)= 0
read(5, "SQLite format 3\0\4\0\1\1\0@  \0\0\4\200\0\0\0\0"..., 1024)  
= 1024

_llseek(5, 3072, [3072], SEEK_SET)  = 0
read(5, "\2\2]\0\22\1\214\6\0\0\16%\2\323\2\265\2z\2\361\3\245\3"...,  
1024) = 1024

_llseek(5, 395264, [395264], SEEK_SET)  = 0
read(5, "\n\0\0\0$\0\\\0\0\\\0v\0\217\0\251\0\303\0\335\0\367\1"...,  
1024) = 1024
fcntl64(5, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET, start=0,  
len=0}, 0xbfffdfd0) = 0
fcntl64(5, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET,  
start=1073741824, len=1}, 0xbfffdb60) = 0
fcntl64(5, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET,  
start=1073741826, len=510}, 0xbfffdb60) = 0
fcntl64(5, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET,  
start=1073741824, len=1}, 0xbfffdb60) = 0

access("/db/data-journal", F_OK) = -1 ENOENT (No such file or directory)
fstat64(5, {st_mode=S_IFREG|0644, st_size=4402176, ...}) = 0
_llseek(5, 0, [0], SEEK_SET)= 0
read(5, "SQLite format 3\0\4\0\1\1\0@  \0\0\4\200\0\0\0\0"..., 1024)  
= 1024

_llseek(5, 2048, [2048], SEEK_SET)  = 0
read(5, "\5\0\0\0\2\3\364\0\0\0\n\253\3\372\3\364\0\0\0\0\0\0\0"...,  
1024) = 1024

_llseek(5, 5120, [5120], SEEK_SET)  = 0
read(5, "\2\0\0\0\6\3\277\1\0\0\r\341\3\365\3\352\3\277\3\311\3"...,  
1024) = 1024

_llseek(5, 1768448, [1768448], SEEK_SET) = 0
read(5, "\n\0\0\0q\0\370\0\0\370\0\377\1\6\1\f\1\22\1\30\1\36\1"...,  
1024) = 1024

_llseek(5, 405504, [405504], SEEK_SET)  = 0
read(5, "\5\0\0\0\207\1(\0\0\0\1\207\1(\1-\0012\0017\1<\1A\1F\1"...,  
1024) = 1024

_llseek(5, 137216, [137216], SEEK_SET)  = 0
read(5, "\r\0\0\0\1\1\35\0\1\35\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,  
1024) = 1024

_llseek(5, 293888, [293888], SEEK_SET)  = 0
read(5, "\r\0\0\0\2\0\217\0\0\217\3H\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,  
1024) = 1024

_llseek(5, 406528, [406528], SEEK_SET)  = 0
read(5, "\5\0\0\0|\1\30\0\0\0\n]\1\30\1\36\1$\1*\0010\0016\1<\1"...,  
1024) = 1024

_llseek(5, 1178624, [1178624], SEEK_SET) = 0
read(5, "\r\0\0\0\2\0\353\0\0\353\3\221\0\0\0\0\0\0\0\0\0\0\0\0"...,  
1024) = 1024

_llseek(5, 2127872, [2127872], SEEK_SET) = 0
read(5, "\r\0\0\0\3\0s\0\3\221\0s\0\342\0\0\0\0\0\0\0\0\0\0\0\0"...,  
1024) = 1024


etc

There is more or less a pattern that is very similar to the snippet  
above that repeats over and over in the trace.  It sort of seems like  
it is reading a header or something in the database file over and  
over again.


Additional notes:
I set the following pragmas in this order when I create/open the  
database file (which only happens once when the app loads):

  PRAGMA auto_vacuum = 1
  PRAGMA temp_store = MEMORY
  PRAGMA synchronous = OFF
  PRAGMA default_cache_size = 5

Thanks,
Sean