For instance, I remember we saw something like a 5-10% speedup with a major RDBMS vendor when we translated their multiple process shared memory architecture into multiple threads. In this case, most of the gain was due to the reduced iTLB misses (due to sharing what was a large quantity of text), and more efficient thread local storage.
However, a counter example would be a major CMS vendor who does pretty much the reverse (i.e. for large configurations they actually split one single large multithreaded process into a number of multithreaded processes joined by shared memory). The reason being that their code introduces too much contention.
It is no coincidence that the RDBMS vendor codes in C and the CMS vendor in C++. A large part of the latter's scalability bottleneck is in heap allocation. In a single-threaded application there is no contention for the heap. So it's not just primary algorithm scalability you have to worry about, but also all that secondary stuff that you take for granted.
In Solaris 10 we rolled libthread into libc. This means that some singlethreaded applications pay a small tax to use thread safe code. However, we did tweak the mutex code to make it cheaper for processes with only on thread. Again, this means that, all other things being equal (e.g. no contention for the heap or TLBs etc), a single threaded process may still be faster than its multithreaded equivalent.
But as my two examples above demonstrate, your mileage may vary. So, I'm in violent agreement with Frank: "it depends"!
Phil Frank Hofmann wrote:
On Fri, 1 Sep 2006, Nergal Dimitri wrote:How much performance do I "miss" when running several instances of a single threaded application on a multi-core or a system with several CPUs compared to running a multithreaded application on the same system (same application but with several threads)? Are there any big differences in Solaris 10?Could the performance increase when using zones for the multiple instances of the same application? (one instance in each zone) If so, could that match the performance of the multi-threaded application?Cheers, NergalHi Nergal,These questions are a bit too generic to answer, so it's "it depends". Whether a multithreaded application is faster than multiple instances of a singlethreaded application depends on the amount of data sharing done, and/or on the way how parallelism is achieved. I.e. what is your unit of work ? Some simple generalizations can be made:- both multithreading and running multiple instances profit from having more than one CPU (core), as that turns timesliced multitasking into true, concurrent execution. - for work that does not share data, parallelism amounts to job scheduling, i.e. one processing job with its associated data is started at the same time. In that case, you can achieve the same with multithreading as with multiple instances; an example here would be the way the Apache webserver handles requests - one instance per client served. - for work that does share data, the overhead of doing so is less for multithreading than for running multiple instances (and using IPC mechanisms for data sharing / synchronization). - there are some types of work that don't lend themselves well to parallelization. Take for example checksumming - it's CPU bound, and the time it takes to compress N bytes of data is fixed. A multi-CPU/multi-core machine may allow you to compress MxN bytes of data in the same time, if multiple threads request the same operation - but a single thread running the compression algorithm will not become faster. Again, "unit of work".How much you'll gain by going multithreaded compared to staying with singlethreadedness and running multiple instances depends a lot on the amount of data sharing you have. See the last item; if all you do is e.g. compress files, then it doesn't matter whether you spawn N singlethreaded processes that each compresses one file at a time, or whether you create N threads that each compresses one file at a time. But if you e.g. want to accelerate a Gaussian blur on an image taken from your nice new shiny DSLR camera, creating N threads in Photoshop where each processes 1/Nth of the lines will make your operation run much faster, and it doesn't help you if you could've started N Photoshop instances operating on N different pictures - you only want one...In that sense, identify your unit of work. Then analyze whether multiple such units of work share data. If so, you might be able to cut down on the time per unit of work by splitting this up into parallel subtasks, and you should attempt to go for multithreading. But if there's no/little data sharing, you'll benefit from multi-CPU only if you fan out horizontally - execute multiple work/data "bundles", and that can easily be done by just spawning off another "job", which might well be singlethreaded.Hope that's some food for thought, FrankH.--------------------------------------------------------------------------If anything's worth doing, do it with all your heart. The Buddha Wheresoever you go, go with all your heart. Confucius Whatever you do, do it with all your heart. St.Paul, Col 3:23--------------------------------------------------------------------------_______________________________________________ opensolaris-discuss mailing list [email protected]
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ opensolaris-discuss mailing list [email protected]
