Michał Zaborowski wrote:
Ok - we want to sort table with quick sort and we want to do it on - N threads.
Every thread - gets begin and end of indices of the table. First step starts
at 0 and lasts with count -1. Single step:  find medium value and move
lover before it and bigger after. In normal case - we use recursive call - so
the same procedure is being called for that two parts. In thread we can put
indices at side list - and use queue of threads to pick up data from the list.
We can use common table, access to side list with indices has to be serialized.
Stupid question #2: Is it well recognized that the CPU is the bottleneck in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and I/O?

It would seem to me that any sort worth parallelizing (administrative and synchronization overhead), must have data larger than the L2 cache. If larger than the L2 cache, it becomes real memory speed. If real memory speed, wouldn't one CPU without hardware synchronization, be able to fill the memory read/write pipe? If 'divide and conquer' to parallize, wouldn't the values written from one thread, often (1 / N) need to be read from another thread, requiring hardware data synchronization?

I see the wikipedia.org page describes how easy it is to parallelize quick sort, and scale performance linearly with the number of processors, but I don't see references to back this claim. At least some of these steps seem difficult or impractical to parallelize. For example, the initial partition reorder that moves items lower than the pivot to the left, and items higher than the pivot to the right, would not be easy to parallelize using an in-place re-order. It needs to move one partition down before it can 'divide and conquer'. They say no synchronization is required, but I think they are missing the hardware synchronization required (especially in the inner most loops where the thread task becomes shorter, and starts to fit in L1/L2). They say linear, but then talk about a 'new thread being created'. New thread creation has a cost, and if reduced to using a thread pool, then synchronization *is* required.

It sounds like a 'nice in theory' idea. :-) Which doesn't mean it is wrong...

I am curious enough to write a test...
Or do you mean being able to perform parts of the query plan fully in
parallel? If this, then one would need a lot more than ParallelSort..
Nice to have, but rather for data warehouses. In other cases... IE - backend
for Internet - there are many requests and every processor / core works nice.
I'm a fan of the 'each plan item is a task, that is assigned to the pool, with each CPU grabbing tasks from the pool'. Another 'nice in theory' idea (used by DB2?). As it is, though, I think PostgreSQL planning is heavily designed to maximize performance on a single CPU, and single queries would not easily scale to multiple CPUs. (Perhaps hashing could be done on another CPU, or as you describe above, sorting)

Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Reply via email to