Emerson Clarke wrote:
The idea is that because i am accessing two databases, and doing
several file system operations per document, there should be a large
gain by using many threads. There is no actual indexing process, the
whole structure is the index, but if anything the database operations
take the most time. The filesystem operations have a very small
amount of overhead.
That is all unclear from your original description. Aren't you trying
to "index" several million documents and doesn't the process of indexing
consist of two parts?
1: Open the document, parse it in various ways, build index data, close it
2: Add a row to a SQLite database
My point was that #1 is way more work than #2, so you can run #1's in
multiple threads/processes and do #2 in a single thread using a
queue/pipe object for communication.
On the other hand, if #1 is way less work than #2 then you will be bound
by the speed at which you decide to make transactions in SQLite. A 7200
rpm disk limits you to 60 transactions a second. The more rows per
transaction, the more rows per second.
Roger
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------