There are very few algorithms that actually benefit from using even low hundreds of threads, let alone thousands. The ability of Erlang (and go an IO and many others) to spawn 100,000 threads makes an impressive demo for the uninitiated, but finding practical uses of such abilities is very hard.

It may be true that there are only a small number of basic algorithms that benefit from massive parallelization. The important thing is not the number of algorithms: it's the number programs and workloads. Even if there was only one parallel algorithm, if that algorithm was needed for the majority of parallel workloads then it would be significant.

In fact, though utilizing thousands of threads may be hard, once you get to millions of threads then things become interesting again. Physical simulations, image processing, search, finance, etc., are all fields that exhibit workloads amenable to large scale parallelization. Pure SIMD (vectorization) is insufficient for many of these workloads: programmers really do need to think in terms of threads (most likely mapped to OpenCL or Cuda under the hood).

To use millions of threads you don't focus on what the algorithm is doing: you focus on where the data is going. If you move data unnecessarily (or fail to move it when it was necessary) then you'll burn power and lose performance.

Reply via email to