Emerson,
 
 Is the Database on the same disk as the rest of the file operations? If so is 
it possible that you are I/O bound and causing seek issues due to i/o access 
patterns?
 
 Take a look at the test_server.c code in the sqlite/src directory. I used that 
as a basis to build a custom library that opens a single DB and then allows 
multiple threads to access. The nice thing about this architecture is that the 
threads will all get to write and no writer starvation. But all write 
operations an single threaded. 
 
 The test code I ran creates any number of threads and performs the following 
in each thread:
 
 outer loop 1- 10 
      begin txn
      loop  1 -1000
           insert record (using modulo for data so data is unique amongst 
threads)
      end loop
      commit
 
      prepare statement
      loop 1 - 1000
          Select data (using modulo)
      end loop
       close statement
 
       begin transaction
       loop 1 - 1000
           delete data, using same modulo 
       end loop
  end main loop
 
 timinng (seconds)          Thread count
 1.665                                    1                   (transaction size 
is 1000)
 1.635                                    2                   (transcaction 
size is 500)
 3.094                                   4                    ( txn size is 250 
)
 5.571                                   8                    (txn size is 125 )
 7.822                                16                    (txn size is 62.5)
 
 so as the number of threads increase the overall time it takes to 
insert/select/delete a fixed set of data increases using this architecture. 
This is because all threads are serialized upon inserts/deletes and are 
contending on a single writer mutex. So in this particular case fewer threads 
actually improves performance. 
 
 Hope this helps,
 Ken
 
  
         

Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,

Thanks for the suggestions.  I think using a worker thread and a queue
would be equivalent to just running a single thread since it
effectively makes the database operations synchronous.  Although i can
see what your driving at regarding the transactions every n records.

The idea is that because i am accessing two databases, and doing
several file system operations per document, there should be a large
gain by using many threads.  There is no actual indexing process, the
whole structure is the index, but if anything the database operations
take the most time.  The filesystem operations have a very small
amount of overhead.

I have tried the page size pragma setting already, though i read that
it is dependent on the cluster size of the particular filesystem that
you are running on.

Since i only have one connection to each database from each thread i
dont think i would benefit from the caching.  Im not quite sure why
you would ever have more than one connection to the database from a
single thread ?  The api that i use more or less ensures that under
most circumstances there is only one connection.

Emerson

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------


Reply via email to