If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing would be very different.

Michael Ruck wrote:
Hi Emerson,

I just hope you don't reinvent the wheel ;) I haven't yet had the need to
index things the way you describe it. May be I should take that as one of my
next pet projects to get a handle on this type of task.

The problem as I see it is basically, that any way you design this: If the
storage tasks take 90% of your indexing time, then any parallelization may
be a waste of effort. Even if you use a synchronization object you're
essentially serializing things in a (complicated) multithreaded way...

As far as static initialization: That it occurs before main() and is out of
your control was the point I was getting across. That's why I wrote that
this type of initialization should be avoided, unless there's no better
design for it.

Michael

-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 20:31
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Thanks for the advice.  During the indexing process i need to select and
optionally insert records into a table so i cant ignore the outcomes.

Basically the indexing process does compression, so for each document it
inserts words into a table and looks up keys.  Every word in the document
gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way as you
suggested. I would either end up with a lot of queues or i would have to
stagger the work so that the entire data set gets processed in stages which
doesnt scale very well and isnt particularly fault tollerant.  When building
an index, you want the structure to be built up progressively, so that you
can pause the process and resume it later on whilst still having useful
results.

I would be worried that in a queued design, the overhead and bottlenecks
caused by the buffering, message passing, and context switching would reduce
the performance to that of a single thread.
Especially since the database operations represent 90% of the work, all you
would really be doing is attempting to serialise things in a multithreaded
way.

Im sure having worked on multithreaded systems you appreciate that sometimes
simple designs are better, and i think i have a pretty good handle on what
it is that im trying to do.

You never have control over static initialisation, it happens before main().
If i was writing very specific code to suit just this situation then maybe
as you say i wouldnt need to worry about it.  But im also writing a database
api, and that api is used for many different things.  My considderations are
not just for this one problem, but also for the best general way to code the
api so that it is safe and efficient in all circumstances.  So far the
client/server design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do away
with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:

Emerson,

Now I understand your current implementation. You seemingly only partially split up the work in your code. I'd schedule the database operation and not wait on the outcome, but start on the next task. When the database finishes and has retrieved its result, schedule some work package on a third thread, which only processes the results etc. Split up the work in to repetitive, non blocking tasks. Use multiple queues and dedicated threads for parts of the operation or thread pools,

which process queues in parallel if possible.

From what I can tell you're already half way there.

I still don't see your static initialization problem, but that's another story. Actually I'd avoid using static initialization or static (singleton) instances, unless the design really requires it. Someone must control startup of the entire process, have that one (probably main/WinMain) take care that the work queues are available. Afterwards the order of thread starts doesn't matter... Actually it is non-deterministic anyway (unless you serialize this yourself.)

Michael

-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread blocks until the server thread has processed the query and returned the result. If i did not need the result then a simple queueing system with atomic operations or critical sections would be fine i guess.

The client thread must always block or spin until the server thread has completed the query. Critical sections cant be efficiently used to notify other threads of status change. I did try using critical sections in this way, by spinning until the server thread takes a lock, then blocking and eventually waiting for the server thread to finish. But since there is no way to block the server thread when there is no work to do both the client and server thread must sleep which

induces context switching anyway.

If you used atomic operations, how would you get the client thread to block and the server thread to block when it is not processing ?

Events/conditions seemed to be the best solution, the server thread never runs when it doesnt need to and always wakes up when there is processing to be done.

The static initialisation problem occurs becuase the server thread must be running before anything which needs to use it. If you have a static instance of a class which accesses a database and it is initalised before the static instance which controls the server thread,

you have a problem.

It can be overcome using the initialise on first use idiom, as long as your careful to protect the initalisation with atomic operations, but its still a bit complicated.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:

Hi Emerson,

Another remark: On Windows using Events synchronization objects involves additional kernel context switches and thus slows you down more than necessary. I'd suggest using a queue, which makes use of the InterlockedXXX operations (I've implemented a number of those, including priority based ones - so this is possible without taking a single lock.) or to use critical sections - those only take the kernel context switch if there really is lock contention. If you can reduce the kernel context switches, you're performance will likely increase

drastically.

I also don't see the static initialization problem: The queue has to be available before any thread is started. No thread has ownership of the queue, except may be the main thread.

Michael


-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 00:57
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Nico,

I have implemented all three strategies (thead specific connections, single connection multiple threads, and single thread server with multiple client threads).

The problem with using thread specific contexts is that you cant have a single global transaction which wraps all of those contexts. So you end up having to use fine grained transactions, which decreases

performance.

The single connection multiple thread alternative apparently has problems with sqlite3_step being active on more than one thread at the same moment, so cannot easily be used in a safe way. But it is by far the fastest and simplest alternative.

The single thread server solution involves message passing between threads, and even when this is done optimally with condition variables (or events on windows) and blocking ive found that it results in a high number of context switches and decreased performance. It does however make a robust basis for a wrapper api, since it guarantees that things will

always be synchronised.

But using this arrangement can also result in various static initialisation problems, since the single thread server must always be up and running before anything which needs to use it.

Emerson

On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:

On Sat, Dec 30, 2006 at 03:34:01PM +0000, Emerson Clarke wrote:

Technically sqlite is not thread safe.  [...]

Solaris man pages describe APIs with requirements like SQLite's as "MT-Safe with exceptions" and the exceptions are listed in the man

page.

That's still MT-Safe, but the caller has to play by certain rules.

Anyways, this is silly. SQLite API is MT-Safe with one exception and that exception is rather ordinary, common to other APIs like it that have a context object of some sort (e.g., the MIT krb5 API), and not really a burden to the caller. In exchange for this exception you get an implementation of the API that is lighter weight and easier to maintain than it would have been without that exception; a good trade-off IMO.

Coping with this exception is easy. For example, if you have a server app with multiple worker threads each of which needs a db context then you could use a thread-specific key to track a per-thread db context; use pthread_key_create(3C) to create the key, pthread_setspecific(3C) once per-thread to associate a new db context with the calling thread, and pthread_getspecific(3C) to get the calling thread's db context when you need it. If you have a protocol where you have to step a statement over multiple message exchanges with a client, and you don't want to have per-client threads then get a db context per-client/exchange and store that and a mutext in an object that represents that

client/exchange.  And so on.

Nico
--

------------------------------------------------------------------
--
--
------- To unsubscribe, send email to [EMAIL PROTECTED]
------------------------------------------------------------------
--
--
-------



--------------------------------------------------------------------
--
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
--------------------------------------------------------------------
--
------
-



--------------------------------------------------------------------
--
------- To unsubscribe, send email to [EMAIL PROTECTED]
--------------------------------------------------------------------
--
-------



----------------------------------------------------------------------
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------
------
-



----------------------------------------------------------------------
------- To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------
-------




----------------------------------------------------------------------------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------------
-



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to