On Fri, 14 May 2010 17:35:20 +0100, B. Estrade - estr...@gmail.com
<+nntp+browseruk+c4c81fb0fa.estrabd#gmail....@spamgourmet.com> wrote:
The future is indeed multicore - or, rather, *many-core. What this
means is that however the hardware jockeys have to strap them together
on a single node, we'll be looking at the ability to invoke hundreds
(or thousands) of threads on a single SMP machine.
There are very few algorithms that actually benefit from using even low
hundreds of threads, let alone thousands. The ability of Erlang (and go an
IO and many others) to spawn 100,000 threads makes an impressive demo for
the uninitiated, but finding practical uses of such abilities is very hard.
One example cited is that of gaming software that runs each sprite ina
separate "thread". The claim is that this simplifies code because each
sprite only has to respond to situations directly applicable to it, rather
than some common sprite handler having to select which sprite to operate
upon. But all it does is move the goal posts. You either have to select
which sprite to send a message to; or send a message to the sprite
handler and have it select the sprite to operate upon.
A third technique is to send the message to all the sprites and have then
decide if it is applicable to them. But it still requires a loop, and you
then have the communications overhead *100,000 + the context witch costs *
100,000. The numbers do not add up.
Then, inevitably,
*someone will want to strap these together into a cluster, thus making
message passing an attractive way to glue related threads together
over a network. Getting back to the availability of many threads on a
single SMP box, issues of data locality and affinity and thread
binding will become of critical importance.
Perhaps surprisingly, these are not the issues they once were. Whilst
cache misses are horribly expensive, the multi-layered caching in modern
CPUs combines with deep pipelines, branch prediction, register renaming
and other features in ways that are beyond the ability of the human mind
to reason about.
For a whirlwind introduction to the complexities, see the short video here:
http://www.infoq.com/presentations/click-crash-course-modern-hardware
The only way to test the affects is to profile, and most of the research
into the effects of cache locality tend to be done in isolation of
real-world application mixes. very few machines, even servers of various
types, run a single application these days. This is even truer as server
virtualisation becomes ubiquitous. Mix in a soupçon of virtual server
load-balancing and trying to code for cache locality becomes almost
impossible.
These issues are closely
related to the operating system's capabilities and paging policies, but
eventually (hopefully) current, provably beneficial strategies will be
available on most platforms.
Brett