Re: [sqlite] Multi-threading.

Andrew Piskorski Wed, 27 Jul 2005 10:31:58 -0700

On Tue, Jul 26, 2005 at 10:21:22PM -0400, Mrs. Brisby wrote:

> That's incorrect. Threading increases development time and produces less
> stable applications. In fairness: it's the skill level of the engineer


Mrs. Brisby, that is probably quite correct in at least one particular
sense, but since you're so eager to bring science into the discussion,
you should define your terms more carefully.

When people - even smart and knowledgeable people - say "thread", they
typically mean one of two related but quite distinct concepts, and
they often don't say precisely which they mean:

They may mean "thread of control", or they may mean, "threads as
implemented by system X", where X usually means either POSIX or Win32
threads.  It would be nice if there were clearly distinct names for
those two things, but I don't know of any.

A "thread of control" is a conceptual entity, and is usually
contrasted to event-driven state machines, both of which are used for
concurrent programming.  Using 5 POSIX threads and using 5 Unix
processes are BOTH examples of using 5 threads of control.  In the
conceptual sense they are BOTH using "threads" in contrast to
"events".

I see that sense of "thread of control" in the academic computer
science literature a lot.  They mostly seem convinved that mutiple
threads of control are usually much easier to program than state
machines.  Thus they like to write papers on different ways of
minimizing the overhead of threads vs. state machines on a single CPU,
how to best take advantage of multiple CPUs, implementing lightweight
threads on top of an event driven core, etc. etc.  Often, whether
memory is default-shared or default-private seems to be an
implementation detail of little direct interest to those researchers.

Working programmers who want to write applications, not their own
operating system, compiler, or related libraries, don't much care
about that.  Their OS and other tools typically support and encourage
a TINY handful of tools for concurrent programming - many of which
often suck.  The danger is that plenty of those working programmers
know nothing about the other 95% of the possibilities their particular
OS does NOT give them, so they make lots of overly broad
generalizations.  And a serious computing SCIENTIST certainly should
not allow himself (or herself) to make that error.

Now here's the interesting part: When it comes to different styles of
implementing "threads of control" (processes vs. POSX threads, etc.),
like lots of things with computers, it's often not that hard to build
one style on top of an implemention originally intended for another
style - sometimes even with decent efficiency.

For example, it is quite possible - and sometimes useful - to
implement a "private-mostly, optionally shared" process-like memory
model on top of shared-everything POSIX or Win32 threads.  E.g.,
AOLserver and Tcl do that.  Erlang does not, but could.  (And they
each have their reasons for that.)

I doubt that it would be ever be useful to implement a shared-mostly
model on top of multiple Unix processes and System V shared memory,
but you could if you really wanted to.

"thread vs. processes" just isn't a terribly well posed argument.  It
is an ugly smoosheed together mess of several different orthogonal
concepts:

1. Mulitple threads of control vs. an event-driven state machine.
2. Default-shared vs. default-private memory.
3. Shared memory vs. message passing.

And those are still just a few dimensions in the space of techniques
for concurrent programming.  Here's at least ONE other important
dimension that everyone on the SQLite list is probably familiar with:
"transactions vs. explicit locking".

And, aha, if you had good support for TRANSACTIONS in-memory in your
programming language environement (aka, transactional memory), how
much would it then matter to you whether your concurrent transactions
were implemented default-shared or default-private memory underneath?
Not much!

Think:  Do I as a user care that Oracle is implemented multi-process w/
System V shared memory while SQL Server uses Win32 threads?  Only
slightly, if at all.  Why?  Because transactions are a higher level
abstraction, and as abstractons go it's pretty wateright, not very
leaky at all.  (I am MUCH more interested in the fact that Oracle uses
MVCC while SQL Server uses pessimistic locking - different flavors of
transactions.)

My personal suspicion is that there probably many more such dimensions
to concurrent programming technique, mostly poorly investigated or
poorly known (or just plain not invented yet at all), some of which
are hiding vastly larger productivity gains than belaboring of
"threads vs. processes".

Related to that, although I haven't read it yet, this book was
recently highly recommended to me:

   Concepts, Techniques, and Models of Computer Programming
   by Peter Van Roy, Seif Haridi
   http://www.amazon.com/exec/obidos/tg/detail/-/0262220695/104-1093447-6927163

A reasonable analogy to "threads vs. processes" (at least for Linux
folk) may be, "Gnome vs. KDE".  Now, "Gnome vs. KDE" is not a useless
question.  Although I personally hardly care, there must be real
differences between Gnome and KDE, and there are probably sound
engineering reasons for debating the relative merits of the two
designs, sometimes.

But "Gnome vs. KDE" really boils down to a bunch of more specific
questions about things each due, AND even the sum total of each of
those things are probably pretty trivial compared to, "Good GUI user
interface design and programs and toolkits to support it" - which is
the REAL problem that both Gnome and KDE are supposed to be solving.

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

Re: [sqlite] Multi-threading.

Reply via email to