Re: [HACKERS] Implementing Sorting Refinements

mac_man2005 Mon, 07 Jan 2008 16:08:11 -0800

Well, sorry for hijacking... ummm how did I do that?

Anyway I'll thank you for giving a "sign of life" when I was almost loosingmy hopes to get any kind of answer from "-hackers".

I suppose the lack of answers was due to the way I wrote my mail. At thatmoment I supposed that at least someone reminded the 2WRS technique andpossible benefits described into previous posts.I think I was wrong, so I'll write it once again hoping meanwhile to getsome suggestions on: HOWTO write a mail to which "-hackers" will give ananswer :) hehehehe


Thanks for your attention.
Manolo.

--------------------------------------------------
From: "Decibel!" <[EMAIL PROTECTED]>
Sent: Tuesday, January 08, 2008 12:34 AM
To: "Manolo _" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Subject: Re: [HACKERS] Implementing Sorting Refinements

You'll get better response if you don't hijack threads. :) Also, there'snothing in here that describes what the benefits of this change are.
On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:
Hi to all.
This mail is aimed at asking some suggestion to face PostgreSQL (PG)development to implement a refinement to PG source code. I'll brieflysummarize the idea of the "2-Way Replacement Selection" (2WRS)refinement, just in case. If you already remember what 2WRS is, you canplease jump directly to the IMPLEMENTATION ISSUES part of this mail.
2WRS.
Refinement of the Replacement Selection (RS) technique currently used byPG in src/backend/utils/sort/tuplesort.c .The 2WRS uses two heaps instead of just one in order to create thecurrent (logical) run. Here there are some fundamental points of the2WRS technique:
- 'qsort' the initial unsorted 'memtuples' array
- divide the 'memtuples' array into two parts and each of those will bemanaged as a heap- one of the heaps will arrange its elements in ascending order, whilethe other heap in descending order- each heap will spill its current root in its corresponding run (i.e.:we have a run per each of those two heaps), so we are actually creating2 physical current runs- those 2 current physical runs could "theoretically" be merged into thesame logical run, actually we can make 'mergesort' think they do belongto the same physical run. That reduces the number of comparisons'mergesort' has to do at each merge step (that means less seek delaytime on mass storage). We can also think the average length of logicalruns produced by 2WRS will probably be greater than the average lengthproduced by RS (again less seek delay time: the longer each run the lessnumber of final runs produced, that means the less number of comparisonsat each 'mergesort' step).
IMPLEMENTATION ISSUES.
Where to place those heaps?
1) I think that both heaps could be arranged on the same 'memtuples'array. That allows easily subsequent resizing those heaps according totheir effective use or according to some heuristic, without reallocatingmemory.
How to arrange those heaps?
2a) It's convenient to arrange those heaps "root to root". That isarranging those heaps with their roots toward the center of 'memtuples'(in a way we can say they lay "face to face", or "root to root" as saidbefore) while their leaves lay towards the extreme indexes of the'memtuples' array (that is the last leaf of one heap will lay at index0, the last leaf of the other heap laying at index memtupsize-1. Thisarrangement prevents overlapping elements between those physical runsassociated to the same current logical run.PRO: once we qsort memtuples and divide it into 2 parts we already getthose two heaps, no need to build them.
CONTRA: ???
2b) As in 2a) but arranging heaps "leaf to leaf", that is their rootswill lay at the extreme indexes of 'memtuples' while their leavestowards the center of the 'memtuples' array.Or even start building heaps as soon as we get initial elements, insteadof qsort the whole 'memtuples' array.
Any PRO/CONTRA compared to 2a)???

Current run numbers
I think I should duplicate the 'int currentRun' variable in theTuplesortstate struct. I'll replace it with a 'int currentRunUP' and'int currentRunDOWN' variables in order to distinguish those twophysical runs associated to those 2 heaps. In this case I will give arun number (max{currentRunUP,currentRunDOWN} + 1) to elements notbelonging to the current logical run. I suppose no need to touch 'longavailMem' nor 'long allowedMem' variables nor any others.
Heap functions
I will duplicate all the heap management functions in order to adaptthem to the kind of heap they should be applied to (for example, thetuplesort_heap_siftup function should be replaced withtuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).
Merge Plan
This technique would use a sort of "merge plan" to instruct mergesort onhow to use those physical runs. Actually mergesort should consider atfirst "odd runs" before "pair runs". That is, for example, mergesortshould start merging runs with run number 1,3,5,7,... and when runnumber X terminates start considering run number X+1. Obviously thatdoesn't need any merge plan, but I remember someone else (as SimonRiggs) was interested in sorting improvements so it could be a goodthing to know if I should consider any conventions or paramethers inorder to possibly create that merge plan.
DEVELOPMENT CONTEXT
I preferred to use the "last stable release" at the moment, that is  8.2.


Any comment/suggestion/advice ?

Thanks for your attention and for your time.
Regards, Manolo.
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's  FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
---------------------------(end of  broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Implementing Sorting Refinements

Reply via email to