Emerson,
Is the Database on the same disk as the rest of the file operations? If so is
it possible that you are I/O bound and causing seek issues due to i/o access
patterns?
Take a look at the test_server.c code in the sqlite/src directory. I used that
as a basis to build a custom library that opens a single DB and then allows
multiple threads to access. The nice thing about this architecture is that the
threads will all get to write and no writer starvation. But all write
operations an single threaded.
The test code I ran creates any number of threads and performs the following
in each thread:
outer loop 1- 10
begin txn
loop 1 -1000
insert record (using modulo for data so data is unique amongst
threads)
end loop
commit
prepare statement
loop 1 - 1000
Select data (using modulo)
end loop
close statement
begin transaction
loop 1 - 1000
delete data, using same modulo
end loop
end main loop
timinng (seconds) Thread count
1.665 1 (transaction size
is 1000)
1.635 2 (transcaction
size is 500)
3.094 4 ( txn size is 250
)
5.571 8 (txn size is 125 )
7.822 16 (txn size is 62.5)
so as the number of threads increase the overall time it takes to
insert/select/delete a fixed set of data increases using this architecture.
This is because all threads are serialized upon inserts/deletes and are
contending on a single writer mutex. So in this particular case fewer threads
actually improves performance.
Hope this helps,
Ken
Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,
Thanks for the suggestions. I think using a worker thread and a queue
would be equivalent to just running a single thread since it
effectively makes the database operations synchronous. Although i can
see what your driving at regarding the transactions every n records.
The idea is that because i am accessing two databases, and doing
several file system operations per document, there should be a large
gain by using many threads. There is no actual indexing process, the
whole structure is the index, but if anything the database operations
take the most time. The filesystem operations have a very small
amount of overhead.
I have tried the page size pragma setting already, though i read that
it is dependent on the cluster size of the particular filesystem that
you are running on.
Since i only have one connection to each database from each thread i
dont think i would benefit from the caching. Im not quite sure why
you would ever have more than one connection to the database from a
single thread ? The api that i use more or less ensures that under
most circumstances there is only one connection.
Emerson
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------