Hi Emerson,
I just hope you don't reinvent the wheel ;) I haven't yet had the need to
index things the way you describe it. May be I should take that as one of my
next pet projects to get a handle on this type of task.
The problem as I see it is basically, that any way you design this: If the
storage tasks take 90% of your indexing time, then any parallelization may
be a waste of effort. Even if you use a synchronization object you're
essentially serializing things in a (complicated) multithreaded way...
As far as static initialization: That it occurs before main() and is out of
your control was the point I was getting across. That's why I wrote that
this type of initialization should be avoided, unless there's no better
design for it.
Michael
-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 20:31
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading
Michael,
Thanks for the advice. During the indexing process i need to select and
optionally insert records into a table so i cant ignore the outcomes.
Basically the indexing process does compression, so for each document it
inserts words into a table and looks up keys. Every word in the document
gets swapped with a key, and new keys are inserted as needed.
There are some problems with splitting the work up in a different way as you
suggested. I would either end up with a lot of queues or i would have to
stagger the work so that the entire data set gets processed in stages which
doesnt scale very well and isnt particularly fault tollerant. When building
an index, you want the structure to be built up progressively, so that you
can pause the process and resume it later on whilst still having useful
results.
I would be worried that in a queued design, the overhead and bottlenecks
caused by the buffering, message passing, and context switching would reduce
the performance to that of a single thread.
Especially since the database operations represent 90% of the work, all you
would really be doing is attempting to serialise things in a multithreaded
way.
Im sure having worked on multithreaded systems you appreciate that sometimes
simple designs are better, and i think i have a pretty good handle on what
it is that im trying to do.
You never have control over static initialisation, it happens before main().
If i was writing very specific code to suit just this situation then maybe
as you say i wouldnt need to worry about it. But im also writing a database
api, and that api is used for many different things. My considderations are
not just for this one problem, but also for the best general way to code the
api so that it is safe and efficient in all circumstances. So far the
client/server design is the only way i can achieve true thread safety.
If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do away
with the need for a client/server design.
Emerson
On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
Emerson,
Now I understand your current implementation. You seemingly only
partially split up the work in your code. I'd schedule the database
operation and not wait on the outcome, but start on the next task.
When the database finishes and has retrieved its result, schedule some
work package on a third thread, which only processes the results etc.
Split up the work in to repetitive, non blocking tasks. Use multiple
queues and dedicated threads for parts of the operation or thread pools,
which process queues in parallel if possible.
From what I can tell you're already half way there.
I still don't see your static initialization problem, but that's
another story. Actually I'd avoid using static initialization or
static (singleton) instances, unless the design really requires it.
Someone must control startup of the entire process, have that one
(probably main/WinMain) take care that the work queues are available.
Afterwards the order of thread starts doesn't matter... Actually it is
non-deterministic anyway (unless you serialize this yourself.)
Michael
-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading
Michael,
Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread
blocks until the server thread has processed the query and returned
the result. If i did not need the result then a simple queueing
system with atomic operations or critical sections would be fine i guess.
The client thread must always block or spin until the server thread
has completed the query. Critical sections cant be efficiently used
to notify other threads of status change. I did try using critical
sections in this way, by spinning until the server thread takes a
lock, then blocking and eventually waiting for the server thread to
finish. But since there is no way to block the server thread when
there is no work to do both the client and server thread must sleep which
induces context switching anyway.
If you used atomic operations, how would you get the client thread to
block and the server thread to block when it is not processing ?
Events/conditions seemed to be the best solution, the server thread
never runs when it doesnt need to and always wakes up when there is
processing to be done.
The static initialisation problem occurs becuase the server thread
must be running before anything which needs to use it. If you have a
static instance of a class which accesses a database and it is
initalised before the static instance which controls the server thread,
you have a problem.
It can be overcome using the initialise on first use idiom, as long as
your careful to protect the initalisation with atomic operations, but
its still a bit complicated.
Emerson
On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
Hi Emerson,
Another remark: On Windows using Events synchronization objects
involves additional kernel context switches and thus slows you down
more than necessary. I'd suggest using a queue, which makes use of
the InterlockedXXX operations (I've implemented a number of those,
including priority based ones - so this is possible without taking a
single lock.) or to use critical sections - those only take the
kernel context switch if there really is lock contention. If you can
reduce the kernel context switches, you're performance will likely
increase
drastically.
I also don't see the static initialization problem: The queue has to
be available before any thread is started. No thread has ownership
of the queue, except may be the main thread.
Michael
-----Ursprüngliche Nachricht-----
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 00:57
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading
Nico,
I have implemented all three strategies (thead specific connections,
single connection multiple threads, and single thread server with
multiple client threads).
The problem with using thread specific contexts is that you cant
have a single global transaction which wraps all of those contexts.
So you end up having to use fine grained transactions, which
decreases
performance.
The single connection multiple thread alternative apparently has
problems with sqlite3_step being active on more than one thread at
the same moment, so cannot easily be used in a safe way. But it is
by far the fastest and simplest alternative.
The single thread server solution involves message passing between
threads, and even when this is done optimally with condition
variables (or events on
windows) and blocking ive found that it results in a high number of
context switches and decreased performance. It does however make a
robust basis for a wrapper api, since it guarantees that things will
always be synchronised.
But using this arrangement can also result in various static
initialisation problems, since the single thread server must always
be up and running before anything which needs to use it.
Emerson
On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:
On Sat, Dec 30, 2006 at 03:34:01PM +0000, Emerson Clarke wrote:
Technically sqlite is not thread safe. [...]
Solaris man pages describe APIs with requirements like SQLite's as
"MT-Safe with exceptions" and the exceptions are listed in the man
page.
That's still MT-Safe, but the caller has to play by certain rules.
Anyways, this is silly. SQLite API is MT-Safe with one exception
and that exception is rather ordinary, common to other APIs like
it that have a context object of some sort (e.g., the MIT krb5
API), and not really a burden to the caller. In exchange for this
exception you get an implementation of the API that is lighter
weight and easier to maintain than it would have been without that
exception; a good trade-off IMO.
Coping with this exception is easy. For example, if you have a
server app with multiple worker threads each of which needs a db
context then you could use a thread-specific key to track a
per-thread db context; use pthread_key_create(3C) to create the
key,
pthread_setspecific(3C) once per-thread to associate a new db
context with the calling thread, and pthread_getspecific(3C) to
get the calling thread's db context when you need it. If you have
a protocol where you have to step a statement over multiple
message exchanges with a client, and you don't want to have
per-client threads then get a db context per-client/exchange and
store that and a mutext in an object that represents that
client/exchange. And so on.
Nico
--
------------------------------------------------------------------
--
--
------- To unsubscribe, send email to
[EMAIL PROTECTED]
------------------------------------------------------------------
--
--
-------
--------------------------------------------------------------------
--
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
--------------------------------------------------------------------
--
------
-
--------------------------------------------------------------------
--
------- To unsubscribe, send email to
[EMAIL PROTECTED]
--------------------------------------------------------------------
--
-------
----------------------------------------------------------------------
------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------
------
-
----------------------------------------------------------------------
------- To unsubscribe, send email to
[EMAIL PROTECTED]
----------------------------------------------------------------------
-------
----------------------------------------------------------------------------
-
To unsubscribe, send email to [EMAIL PROTECTED]
----------------------------------------------------------------------------
-
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------