Re: [HACKERS] Sorting Improvements for 8.4

Mark Mielke Wed, 19 Dec 2007 09:13:47 -0800

Michał Zaborowski wrote:

Ok - we want to sort table with quick sort and we want to do it on - N threads.
Every thread - gets begin and end of indices of the table. First step starts
at 0 and lasts with count -1. Single step:  find medium value and move
lover before it and bigger after. In normal case - we use recursive call - so
the same procedure is being called for that two parts. In thread we can put
indices at side list - and use queue of threads to pick up data from the list.
We can use common table, access to side list with indices has to be serialized.

Stupid question #2: Is it well recognized that the CPU is the bottleneckin the PostgreSQL sorting mechanism? Or might it be memory bandwidth andI/O?

It would seem to me that any sort worth parallelizing (administrativeand synchronization overhead), must have data larger than the L2 cache.If larger than the L2 cache, it becomes real memory speed. If realmemory speed, wouldn't one CPU without hardware synchronization, be ableto fill the memory read/write pipe? If 'divide and conquer' toparallize, wouldn't the values writtenfrom one thread, often (1 / N) need to be read from another thread,requiring hardware data synchronization?

I see the wikipedia.org page describes how easy it is to parallelizequick sort, and scale performance linearly with the number ofprocessors, but I don't see references to back this claim.At least some of these steps seem difficult or impractical toparallelize. For example, the initial partition reorder that moves itemslower than the pivot to the left, and items higher than the pivot to theright, would not be easy to parallelize using an in-place re-order. Itneeds to move one partition down before it can 'divide and conquer'.They say no synchronization is required, but I think they are missingthe hardware synchronization required (especially in the inner mostloops where the thread task becomes shorter, and starts to fit inL1/L2). They say linear, but then talk about a 'new thread beingcreated'. New thread creation has a cost, and if reduced to using athread pool, then synchronization *is* required.

It sounds like a 'nice in theory' idea. :-) Which doesn't mean it iswrong...


I am curious enough to write a test...

Or do you mean being able to perform parts of the query plan fully in
parallel? If this, then one would need a lot more than ParallelSort..

Nice to have, but rather for data warehouses. In other cases... IE - backend
for Internet - there are many requests and every processor / core works nice.

I'm a fan of the 'each plan item is a task, that is assigned to thepool, with each CPU grabbing tasks from the pool'. Another 'nice intheory' idea (used by DB2?). As it is, though, I think PostgreSQLplanning is heavily designed to maximize performance on a single CPU,and single queries would not easily scale to multiple CPUs. (Perhapshashing could be done on another CPU, or as you describe above, sorting)


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Re: [HACKERS] Sorting Improvements for 8.4

Reply via email to