Mark Mielke wrote: > I am curious - what algorithms exist to efficiently do a parallel sort? > Do you mean if sorting 1 million items, it is possible to separate this > into 2 sets of 500 thousand each, execute them in separate threads > (with task administration and synchronization overhead) , combine the > results, and complete the task in significantly less time than doing it > in one thread? I am skeptical that this is possible...
The link in the beginning of the thread points to articles that seem to describe one such algorithm; along with benchmarks. (http://tinyurl.com/3bvu4u, http://tinyurl.com/32wg2m) The improvements were pretty consistent from set sizes ranging from very small sets (hundreds) to quite large ones (hundreds of K). Interestingly, even multi-threading helped a lot. "Our tests correlate well with previous research that showed Intel’s implementation of SMT (Hyper-Threading) to be adept at hiding this latency [6, 20, 12].Table 4 shows that by having two threads access memory at the same time, performance improved over 80% when compared to the singlethreaded version. It uses both quicksort phases and merge phases; for the merge phase using 2CPUs (no hyperthreading) apparently gave more than 2X speed improvement; apparently because it could parallelize memory access with CPU more. > Or do you mean being able to perform parts of the query plan fully in > parallel? If this, then one would need a lot more than ParallelSort... I wouldn't recommend that - it seems like a Hard Problem. My guess is that the best way to use multiple threads in one backend would be to find specific algorithms like sorting that would be easier to isolate. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq