Re: AW: [sqlite] sqlite performance, locking & threading

John Stanton Thu, 04 Jan 2007 06:54:55 -0800

If Emeroson intuitively understood the essential architecture of the PChe is using he would not be having difficulty with his concept of how touse it. It is essentially a serial device, multi-tasking device andparallelism in the forms of threading and multi processing is asophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his studentsthe concept by explaining to them that the machines on their desks weredescended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processingwould be very different.


Michael Ruck wrote:

Hi Emerson,

I just hope you don't reinvent the wheel ;) I haven't yet had the need to
index things the way you describe it. May be I should take that as one of my
next pet projects to get a handle on this type of task.

The problem as I see it is basically, that any way you design this: If the
storage tasks take 90% of your indexing time, then any parallelization may
be a waste of effort. Even if you use a synchronization object you're
essentially serializing things in a (complicated) multithreaded way...

As far as static initialization: That it occurs before main() and is out of
your control was the point I was getting across. That's why I wrote that
this type of initialization should be avoided, unless there's no better
design for it.

Michael

-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]Gesendet: Mittwoch, 3. Januar 2007 20:31
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Thanks for the advice.  During the indexing process i need to select and
optionally insert records into a table so i cant ignore the outcomes.

Basically the indexing process does compression, so for each document it
inserts words into a table and looks up keys.  Every word in the document
gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way as you
suggested. I would either end up with a lot of queues or i would have to
stagger the work so that the entire data set gets processed in stages which
doesnt scale very well and isnt particularly fault tollerant.  When building
an index, you want the structure to be built up progressively, so that you
can pause the process and resume it later on whilst still having useful
results.

I would be worried that in a queued design, the overhead and bottlenecks
caused by the buffering, message passing, and context switching would reduce
the performance to that of a single thread.
Especially since the database operations represent 90% of the work, all you
would really be doing is attempting to serialise things in a multithreaded
way.

Im sure having worked on multithreaded systems you appreciate that sometimes
simple designs are better, and i think i have a pretty good handle on what
it is that im trying to do.

You never have control over static initialisation, it happens before main().
If i was writing very specific code to suit just this situation then maybe
as you say i wouldnt need to worry about it.  But im also writing a database
api, and that api is used for many different things.  My considderations are
not just for this one problem, but also for the best general way to code the
api so that it is safe and efficient in all circumstances.  So far the
client/server design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do away
with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
Emerson,
Now I understand your current implementation. You seemingly onlypartially split up the work in your code. I'd schedule the databaseoperation and not wait on the outcome, but start on the next task.When the database finishes and has retrieved its result, schedule somework package on a third thread, which only processes the results etc.Split up the work in to repetitive, non blocking tasks. Use multiplequeues and dedicated threads for parts of the operation or thread pools,
which process queues in parallel if possible.
From what I can tell you're already half way there.
I still don't see your static initialization problem, but that'sanother story. Actually I'd avoid using static initialization orstatic (singleton) instances, unless the design really requires it.Someone must control startup of the entire process, have that one(probably main/WinMain) take care that the work queues are available.Afterwards the order of thread starts doesn't matter... Actually it isnon-deterministic anyway (unless you serialize this yourself.)
Michael

-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client threadblocks until the server thread has processed the query and returnedthe result. If i did not need the result then a simple queueingsystem with atomic operations or critical sections would be fine i guess.
The client thread must always block or spin until the server threadhas completed the query. Critical sections cant be efficiently usedto notify other threads of status change. I did try using criticalsections in this way, by spinning until the server thread takes alock, then blocking and eventually waiting for the server thread tofinish. But since there is no way to block the server thread whenthere is no work to do both the client and server thread must sleep which
induces context switching anyway.
If you used atomic operations, how would you get the client thread toblock and the server thread to block when it is not processing ?
Events/conditions seemed to be the best solution, the server threadnever runs when it doesnt need to and always wakes up when there isprocessing to be done.
The static initialisation problem occurs becuase the server threadmust be running before anything which needs to use it. If you have astatic instance of a class which accesses a database and it isinitalised before the static instance which controls the server thread,
you have a problem.
It can be overcome using the initialise on first use idiom, as long asyour careful to protect the initalisation with atomic operations, butits still a bit complicated.
Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
Hi Emerson,
Another remark: On Windows using Events synchronization objectsinvolves additional kernel context switches and thus slows you downmore than necessary. I'd suggest using a queue, which makes use ofthe InterlockedXXX operations (I've implemented a number of those,including priority based ones - so this is possible without taking asingle lock.) or to use critical sections - those only take thekernel context switch if there really is lock contention. If you canreduce the kernel context switches, you're performance will likelyincrease
drastically.
I also don't see the static initialization problem: The queue has tobe available before any thread is started. No thread has ownershipof the queue, except may be the main thread.
Michael


-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 00:57
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Nico,
I have implemented all three strategies (thead specific connections,single connection multiple threads, and single thread server withmultiple client threads).
The problem with using thread specific contexts is that you canthave a single global transaction which wraps all of those contexts.So you end up having to use fine grained transactions, whichdecreases
performance.
The single connection multiple thread alternative apparently hasproblems with sqlite3_step being active on more than one thread atthe same moment, so cannot easily be used in a safe way. But it isby far the fastest and simplest alternative.
The single thread server solution involves message passing betweenthreads, and even when this is done optimally with conditionvariables (or events onwindows) and blocking ive found that it results in a high number ofcontext switches and decreased performance. It does however make arobust basis for a wrapper api, since it guarantees that things will
always be synchronised.
But using this arrangement can also result in various staticinitialisation problems, since the single thread server must alwaysbe up and running before anything which needs to use it.
Emerson

On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:
On Sat, Dec 30, 2006 at 03:34:01PM +0000, Emerson Clarke wrote:
Technically sqlite is not thread safe.  [...]
Solaris man pages describe APIs with requirements like SQLite's as"MT-Safe with exceptions" and the exceptions are listed in the man
page.
That's still MT-Safe, but the caller has to play by certain rules.
Anyways, this is silly. SQLite API is MT-Safe with one exceptionand that exception is rather ordinary, common to other APIs likeit that have a context object of some sort (e.g., the MIT krb5API), and not really a burden to the caller. In exchange for thisexception you get an implementation of the API that is lighterweight and easier to maintain than it would have been without thatexception; a good trade-off IMO.
Coping with this exception is easy. For example, if you have aserver app with multiple worker threads each of which needs a dbcontext then you could use a thread-specific key to track aper-thread db context; use pthread_key_create(3C) to create thekey,pthread_setspecific(3C) once per-thread to associate a new dbcontext with the calling thread, and pthread_getspecific(3C) toget the calling thread's db context when you need it. If you havea protocol where you have to step a statement over multiplemessage exchanges with a client, and you don't want to haveper-client threads then get a db context per-client/exchange andstore that and a mutext in an object that represents that
client/exchange.  And so on.
Nico
--

------------------------------------------------------------------
--
--
------- To unsubscribe, send email to[EMAIL PROTECTED]
------------------------------------------------------------------
--
--
-------
--------------------------------------------------------------------
--
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
--------------------------------------------------------------------
--
------
-



--------------------------------------------------------------------
--
------- To unsubscribe, send email to[EMAIL PROTECTED]
--------------------------------------------------------------------
--
-------
----------------------------------------------------------------------
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------
------
-



----------------------------------------------------------------------
------- To unsubscribe, send email to[EMAIL PROTECTED]
----------------------------------------------------------------------
-------
----------------------------------------------------------------------------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------------
-



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: AW: [sqlite] sqlite performance, locking & threading

Reply via email to