There are very few algorithms that actually benefit from using even low
hundreds of threads, let alone thousands. The ability of Erlang (and go
an IO and many others) to spawn 100,000 threads makes an impressive demo
for the uninitiated, but finding practical uses of such abilities is
It may be true that there are only a small number of basic algorithms
that benefit from massive parallelization. The important thing is not
the number of algorithms: it's the number programs and workloads. Even
if there was only one parallel algorithm, if that algorithm was needed
for the majority of parallel workloads then it would be significant.
In fact, though utilizing thousands of threads may be hard, once you get
to millions of threads then things become interesting again. Physical
simulations, image processing, search, finance, etc., are all fields
that exhibit workloads amenable to large scale parallelization. Pure
SIMD (vectorization) is insufficient for many of these workloads:
programmers really do need to think in terms of threads (most likely
mapped to OpenCL or Cuda under the hood).
To use millions of threads you don't focus on what the algorithm is
doing: you focus on where the data is going. If you move data
unnecessarily (or fail to move it when it was necessary) then you'll
burn power and lose performance.