Hmmm. I sense the question is a perhaps a little less general. I think Nergal is asking about the trade-offs of running multiple instances of an application (i.e. the same code) as multiple processes or multiple threads.

For instance, I remember we saw something like a 5-10% speedup with a major RDBMS vendor when we translated their multiple process shared memory architecture into multiple threads. In this case, most of the gain was due to the reduced iTLB misses (due to sharing what was a large quantity of text), and more efficient thread local storage.

However, a counter example would be a major CMS vendor who does pretty much the reverse (i.e. for large configurations they actually split one single large multithreaded process into a number of multithreaded processes joined by shared memory). The reason being that their code introduces too much contention.

It is no coincidence that the RDBMS vendor codes in C and the CMS vendor in C++. A large part of the latter's scalability bottleneck is in heap allocation. In a single-threaded application there is no contention for the heap. So it's not just primary algorithm scalability you have to worry about, but also all that secondary stuff that you take for granted.

In Solaris 10 we rolled libthread into libc. This means that some singlethreaded applications pay a small tax to use thread safe code. However, we did tweak the mutex code to make it cheaper for processes with only on thread. Again, this means that, all other things being equal (e.g. no contention for the heap or TLBs etc), a single threaded process may still be faster than its multithreaded equivalent.

But as my two examples above demonstrate, your mileage may vary. So, I'm in violent agreement with Frank: "it depends"!

Phil


Frank Hofmann wrote:
On Fri, 1 Sep 2006, Nergal Dimitri wrote:

How much performance do I "miss" when running several instances of a single threaded application on a multi-core or a system with several CPUs compared to running a multithreaded application on the same system (same application but with several threads)? Are there any big differences in Solaris 10?

Could the performance increase when using zones for the multiple instances of the same application? (one instance in each zone) If so, could that match the performance of the multi-threaded application?

Cheers,
Nergal

Hi Nergal,

These questions are a bit too generic to answer, so it's "it depends". Whether a multithreaded application is faster than multiple instances of a singlethreaded application depends on the amount of data sharing done, and/or on the way how parallelism is achieved. I.e. what is your unit of work ? Some simple generalizations can be made:

    - both multithreading and running multiple instances profit from
      having more than one CPU (core), as that turns timesliced
      multitasking into true, concurrent execution.

    - for work that does not share data, parallelism amounts to job
      scheduling, i.e. one processing job with its associated data is
      started at the same time. In that case, you can achieve the same
      with multithreading as with multiple instances; an example here
      would be the way the Apache webserver handles requests - one
      instance per client served.

    - for work that does share data, the overhead of doing so is less
      for multithreading than for running multiple instances (and
      using IPC mechanisms for data sharing / synchronization).

    - there are some types of work that don't lend themselves well to
      parallelization. Take for example checksumming - it's CPU bound,
      and the time it takes to compress N bytes of data is fixed. A
      multi-CPU/multi-core machine may allow you to compress MxN bytes
      of data in the same time, if multiple threads request the same
      operation - but a single thread running the compression
      algorithm will not become faster. Again, "unit of work".

How much you'll gain by going multithreaded compared to staying with singlethreadedness and running multiple instances depends a lot on the amount of data sharing you have. See the last item; if all you do is e.g. compress files, then it doesn't matter whether you spawn N singlethreaded processes that each compresses one file at a time, or whether you create N threads that each compresses one file at a time. But if you e.g. want to accelerate a Gaussian blur on an image taken from your nice new shiny DSLR camera, creating N threads in Photoshop where each processes 1/Nth of the lines will make your operation run much faster, and it doesn't help you if you could've started N Photoshop instances operating on N different pictures - you only want one...

In that sense, identify your unit of work. Then analyze whether multiple such units of work share data. If so, you might be able to cut down on the time per unit of work by splitting this up into parallel subtasks, and you should attempt to go for multithreading. But if there's no/little data sharing, you'll benefit from multi-CPU only if you fan out horizontally - execute multiple work/data "bundles", and that can easily be done by just spawning off another "job", which might well be singlethreaded.

Hope that's some food for thought,
FrankH.

--------------------------------------------------------------------------
If anything's worth doing, do it with all your heart.    The Buddha
Wheresoever you go, go with all your heart.        Confucius
Whatever you do, do it with all your heart.        St.Paul, Col 3:23
--------------------------------------------------------------------------
_______________________________________________
opensolaris-discuss mailing list
[email protected]

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to