On Fri, 14 May 2010 17:35:20 +0100, B. Estrade - estr...@gmail.com <+nntp+browseruk+c4c81fb0fa.estrabd#gmail....@spamgourmet.com> wrote:

The future is indeed multicore - or, rather, *many-core. What this
means is that however the hardware jockeys have to strap them together
on a single node, we'll be looking at the ability to invoke hundreds
(or thousands) of threads on a single SMP machine.

There are very few algorithms that actually benefit from using even low hundreds of threads, let alone thousands. The ability of Erlang (and go an IO and many others) to spawn 100,000 threads makes an impressive demo for the uninitiated, but finding practical uses of such abilities is very hard.

One example cited is that of gaming software that runs each sprite ina separate "thread". The claim is that this simplifies code because each sprite only has to respond to situations directly applicable to it, rather than some common sprite handler having to select which sprite to operate upon. But all it does is move the goal posts. You either have to select which sprite to send a message to; or send a message to the sprite handler and have it select the sprite to operate upon.

A third technique is to send the message to all the sprites and have then decide if it is applicable to them. But it still requires a loop, and you then have the communications overhead *100,000 + the context witch costs * 100,000. The numbers do not add up.

Then, inevitably,
*someone will want to strap these together into a cluster, thus making
message passing an attractive way to glue related threads together
over a network.  Getting back to the availability of many threads on a
single SMP box, issues of data locality and affinity and thread
binding will become of critical importance.

Perhaps surprisingly, these are not the issues they once were. Whilst cache misses are horribly expensive, the multi-layered caching in modern CPUs combines with deep pipelines, branch prediction, register renaming and other features in ways that are beyond the ability of the human mind to reason about.

For a whirlwind introduction to the complexities, see the short video here:

http://www.infoq.com/presentations/click-crash-course-modern-hardware

The only way to test the affects is to profile, and most of the research into the effects of cache locality tend to be done in isolation of real-world application mixes. very few machines, even servers of various types, run a single application these days. This is even truer as server virtualisation becomes ubiquitous. Mix in a soupçon of virtual server load-balancing and trying to code for cache locality becomes almost impossible.

These issues are closely
related to the operating system's capabilities and paging policies, but
eventually (hopefully) current, provably beneficial strategies will be
available on most platforms.

Brett

Reply via email to