Re: [HACKERS] Implementing Sorting Refinements
You'll get better response if you don't hijack threads. :) Also, there's nothing in here that describes what the benefits of this change are. On Jan 1, 2008, at 2:09 PM, Manolo _ wrote: Hi to all. This mail is aimed at asking some suggestion to face PostgreSQL (PG) development to implement a refinement to PG source code. I'll briefly summarize the idea of the 2-Way Replacement Selection (2WRS) refinement, just in case. If you already remember what 2WRS is, you can please jump directly to the IMPLEMENTATION ISSUES part of this mail. 2WRS. Refinement of the Replacement Selection (RS) technique currently used by PG in src/backend/utils/sort/tuplesort.c . The 2WRS uses two heaps instead of just one in order to create the current (logical) run. Here there are some fundamental points of the 2WRS technique: - 'qsort' the initial unsorted 'memtuples' array - divide the 'memtuples' array into two parts and each of those will be managed as a heap - one of the heaps will arrange its elements in ascending order, while the other heap in descending order - each heap will spill its current root in its corresponding run (i.e.: we have a run per each of those two heaps), so we are actually creating 2 physical current runs - those 2 current physical runs could theoretically be merged into the same logical run, actually we can make 'mergesort' think they do belong to the same physical run. That reduces the number of comparisons 'mergesort' has to do at each merge step (that means less seek delay time on mass storage). We can also think the average length of logical runs produced by 2WRS will probably be greater than the average length produced by RS (again less seek delay time: the longer each run the less number of final runs produced, that means the less number of comparisons at each 'mergesort' step). IMPLEMENTATION ISSUES. Where to place those heaps? 1) I think that both heaps could be arranged on the same 'memtuples' array. That allows easily subsequent resizing those heaps according to their effective use or according to some heuristic, without reallocating memory. How to arrange those heaps? 2a) It's convenient to arrange those heaps root to root. That is arranging those heaps with their roots toward the center of 'memtuples' (in a way we can say they lay face to face, or root to root as said before) while their leaves lay towards the extreme indexes of the 'memtuples' array (that is the last leaf of one heap will lay at index 0, the last leaf of the other heap laying at index memtupsize-1. This arrangement prevents overlapping elements between those physical runs associated to the same current logical run. PRO: once we qsort memtuples and divide it into 2 parts we already get those two heaps, no need to build them. CONTRA: ??? 2b) As in 2a) but arranging heaps leaf to leaf, that is their roots will lay at the extreme indexes of 'memtuples' while their leaves towards the center of the 'memtuples' array. Or even start building heaps as soon as we get initial elements, instead of qsort the whole 'memtuples' array. Any PRO/CONTRA compared to 2a)??? Current run numbers I think I should duplicate the 'int currentRun' variable in the Tuplesortstate struct. I'll replace it with a 'int currentRunUP' and 'int currentRunDOWN' variables in order to distinguish those two physical runs associated to those 2 heaps. In this case I will give a run number (max{currentRunUP,currentRunDOWN} + 1) to elements not belonging to the current logical run. I suppose no need to touch 'long availMem' nor 'long allowedMem' variables nor any others. Heap functions I will duplicate all the heap management functions in order to adapt them to the kind of heap they should be applied to (for example, the tuplesort_heap_siftup function should be replaced with tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). Merge Plan This technique would use a sort of merge plan to instruct mergesort on how to use those physical runs. Actually mergesort should consider at first odd runs before pair runs. That is, for example, mergesort should start merging runs with run number 1,3,5,7,... and when run number X terminates start considering run number X+1. Obviously that doesn't need any merge plan, but I remember someone else (as Simon Riggs) was interested in sorting improvements so it could be a good thing to know if I should consider any conventions or paramethers in order to possibly create that merge plan. DEVELOPMENT CONTEXT I preferred to use the last stable release at the moment, that is 8.2. Any comment/suggestion/advice ? Thanks for your attention and for your time. Regards, Manolo. _ Express yourself instantly with MSN Messenger! Download today it's FREE!
Re: [HACKERS] Implementing Sorting Refinements
Well, sorry for hijacking... ummm how did I do that? Anyway I'll thank you for giving a sign of life when I was almost loosing my hopes to get any kind of answer from -hackers. I suppose the lack of answers was due to the way I wrote my mail. At that moment I supposed that at least someone reminded the 2WRS technique and possible benefits described into previous posts. I think I was wrong, so I'll write it once again hoping meanwhile to get some suggestions on: HOWTO write a mail to which -hackers will give an answer :) hehehehe Thanks for your attention. Manolo. -- From: Decibel! [EMAIL PROTECTED] Sent: Tuesday, January 08, 2008 12:34 AM To: Manolo _ [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Implementing Sorting Refinements You'll get better response if you don't hijack threads. :) Also, there's nothing in here that describes what the benefits of this change are. On Jan 1, 2008, at 2:09 PM, Manolo _ wrote: Hi to all. This mail is aimed at asking some suggestion to face PostgreSQL (PG) development to implement a refinement to PG source code. I'll briefly summarize the idea of the 2-Way Replacement Selection (2WRS) refinement, just in case. If you already remember what 2WRS is, you can please jump directly to the IMPLEMENTATION ISSUES part of this mail. 2WRS. Refinement of the Replacement Selection (RS) technique currently used by PG in src/backend/utils/sort/tuplesort.c . The 2WRS uses two heaps instead of just one in order to create the current (logical) run. Here there are some fundamental points of the 2WRS technique: - 'qsort' the initial unsorted 'memtuples' array - divide the 'memtuples' array into two parts and each of those will be managed as a heap - one of the heaps will arrange its elements in ascending order, while the other heap in descending order - each heap will spill its current root in its corresponding run (i.e.: we have a run per each of those two heaps), so we are actually creating 2 physical current runs - those 2 current physical runs could theoretically be merged into the same logical run, actually we can make 'mergesort' think they do belong to the same physical run. That reduces the number of comparisons 'mergesort' has to do at each merge step (that means less seek delay time on mass storage). We can also think the average length of logical runs produced by 2WRS will probably be greater than the average length produced by RS (again less seek delay time: the longer each run the less number of final runs produced, that means the less number of comparisons at each 'mergesort' step). IMPLEMENTATION ISSUES. Where to place those heaps? 1) I think that both heaps could be arranged on the same 'memtuples' array. That allows easily subsequent resizing those heaps according to their effective use or according to some heuristic, without reallocating memory. How to arrange those heaps? 2a) It's convenient to arrange those heaps root to root. That is arranging those heaps with their roots toward the center of 'memtuples' (in a way we can say they lay face to face, or root to root as said before) while their leaves lay towards the extreme indexes of the 'memtuples' array (that is the last leaf of one heap will lay at index 0, the last leaf of the other heap laying at index memtupsize-1. This arrangement prevents overlapping elements between those physical runs associated to the same current logical run. PRO: once we qsort memtuples and divide it into 2 parts we already get those two heaps, no need to build them. CONTRA: ??? 2b) As in 2a) but arranging heaps leaf to leaf, that is their roots will lay at the extreme indexes of 'memtuples' while their leaves towards the center of the 'memtuples' array. Or even start building heaps as soon as we get initial elements, instead of qsort the whole 'memtuples' array. Any PRO/CONTRA compared to 2a)??? Current run numbers I think I should duplicate the 'int currentRun' variable in the Tuplesortstate struct. I'll replace it with a 'int currentRunUP' and 'int currentRunDOWN' variables in order to distinguish those two physical runs associated to those 2 heaps. In this case I will give a run number (max{currentRunUP,currentRunDOWN} + 1) to elements not belonging to the current logical run. I suppose no need to touch 'long availMem' nor 'long allowedMem' variables nor any others. Heap functions I will duplicate all the heap management functions in order to adapt them to the kind of heap they should be applied to (for example, the tuplesort_heap_siftup function should be replaced with tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). Merge Plan This technique would use a sort of merge plan to instruct mergesort on how to use those physical runs. Actually mergesort should consider at first odd runs before pair runs. That is, for example
Re: [HACKERS] Implementing Sorting Refinements
On Jan 8, 2008 1:04 AM, [EMAIL PROTECTED] wrote: Well, sorry for hijacking... ummm how did I do that? Anyway I'll thank you for giving a sign of life when I was almost loosing my hopes to get any kind of answer from -hackers. Don't forget that we're just a few days/weeks of 8.3 release so the attention is a bit focused on it at the moment (and I'm not speaking of the security releases of today). Don't feel disappointed about the lack of attention you're suffering at the moment, just post your proposal again after 8.3 release, explain what you plan to do and why and perhaps you'll have the time to write a mock-up and get some numbers to prove your point before that. That could help too. Keep up the good work. Regards, -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Implementing Sorting Refinements
On Tue, 08 Jan 2008, [EMAIL PROTECTED] wrote: Well, sorry for hijacking... ummm how did I do that? You replied to a post instead of creating a new, unrelated e-mail. It is different. Just try to use threaded mode of your e-mail client and you'll get the idea. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Implementing Sorting Refinements
Hi to all. This mail is aimed at asking some suggestion to face PostgreSQL (PG) development to implement a refinement to PG source code. I'll briefly summarize the idea of the 2-Way Replacement Selection (2WRS) refinement, just in case. If you already remember what 2WRS is, you can please jump directly to the IMPLEMENTATION ISSUES part of this mail. 2WRS. Refinement of the Replacement Selection (RS) technique currently used by PG in src/backend/utils/sort/tuplesort.c . The 2WRS uses two heaps instead of just one in order to create the current (logical) run. Here there are some fundamental points of the 2WRS technique: - 'qsort' the initial unsorted 'memtuples' array - divide the 'memtuples' array into two parts and each of those will be managed as a heap - one of the heaps will arrange its elements in ascending order, while the other heap in descending order - each heap will spill its current root in its corresponding run (i.e.: we have a run per each of those two heaps), so we are actually creating 2 physical current runs - those 2 current physical runs could theoretically be merged into the same logical run, actually we can make 'mergesort' think they do belong to the same physical run. That reduces the number of comparisons 'mergesort' has to do at each merge step (that means less seek delay time on mass storage). We can also think the average length of logical runs produced by 2WRS will probably be greater than the average length produced by RS (again less seek delay time: the longer each run the less number of final runs produced, that means the less number of comparisons at each 'mergesort' step). IMPLEMENTATION ISSUES. Where to place those heaps? 1) I think that both heaps could be arranged on the same 'memtuples' array. That allows easily subsequent resizing those heaps according to their effective use or according to some heuristic, without reallocating memory. How to arrange those heaps? 2a) It's convenient to arrange those heaps root to root. That is arranging those heaps with their roots toward the center of 'memtuples' (in a way we can say they lay face to face, or root to root as said before) while their leaves lay towards the extreme indexes of the 'memtuples' array (that is the last leaf of one heap will lay at index 0, the last leaf of the other heap laying at index memtupsize-1. This arrangement prevents overlapping elements between those physical runs associated to the same current logical run. PRO: once we qsort memtuples and divide it into 2 parts we already get those two heaps, no need to build them. CONTRA: ??? 2b) As in 2a) but arranging heaps leaf to leaf, that is their roots will lay at the extreme indexes of 'memtuples' while their leaves towards the center of the 'memtuples' array. Or even start building heaps as soon as we get initial elements, instead of qsort the whole 'memtuples' array. Any PRO/CONTRA compared to 2a)??? Current run numbers I think I should duplicate the 'int currentRun' variable in the Tuplesortstate struct. I'll replace it with a 'int currentRunUP' and 'int currentRunDOWN' variables in order to distinguish those two physical runs associated to those 2 heaps. In this case I will give a run number (max{currentRunUP,currentRunDOWN} + 1) to elements not belonging to the current logical run. I suppose no need to touch 'long availMem' nor 'long allowedMem' variables nor any others. Heap functions I will duplicate all the heap management functions in order to adapt them to the kind of heap they should be applied to (for example, the tuplesort_heap_siftup function should be replaced with tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). Merge Plan This technique would use a sort of merge plan to instruct mergesort on how to use those physical runs. Actually mergesort should consider at first odd runs before pair runs. That is, for example, mergesort should start merging runs with run number 1,3,5,7,... and when run number X terminates start considering run number X+1. Obviously that doesn't need any merge plan, but I remember someone else (as Simon Riggs) was interested in sorting improvements so it could be a good thing to know if I should consider any conventions or paramethers in order to possibly create that merge plan. DEVELOPMENT CONTEXT I preferred to use the last stable release at the moment, that is 8.2. Any comment/suggestion/advice ? Thanks for your attention and for your time. Regards, Manolo. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org