Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread Decibel!
You'll get better response if you don't hijack threads. :) Also,  
there's nothing in here that describes what the benefits of this  
change are.


On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:



Hi to all.

This mail is aimed at asking some suggestion to face PostgreSQL  
(PG) development to implement a refinement to PG source code. I'll  
briefly summarize the idea of the 2-Way Replacement  
Selection (2WRS) refinement, just in case. If you already remember  
what 2WRS is, you can please jump directly to the IMPLEMENTATION  
ISSUES part of this mail.



2WRS.
Refinement of the Replacement Selection (RS) technique currently  
used by PG in src/backend/utils/sort/tuplesort.c .
The 2WRS uses two heaps instead of just one in order to create the  
current (logical) run. Here there are some fundamental points of  
the 2WRS technique:

- 'qsort' the initial unsorted 'memtuples' array
- divide the 'memtuples' array into two parts and each of those  
will be managed as a heap
- one of the heaps will arrange its elements in ascending order,  
while the other heap in descending order
- each heap will spill its current root in its corresponding run  
(i.e.: we have a run per each of those two heaps), so we are  
actually creating 2 physical current runs
- those 2 current physical runs could theoretically be merged  
into the same logical run, actually we can make 'mergesort' think  
they do belong to the same physical run. That reduces the number of  
comparisons 'mergesort' has to do at each merge step (that means  
less seek delay time on mass storage). We can also think the  
average length of logical runs produced by 2WRS will probably be  
greater than the average length produced by RS (again less seek  
delay time: the longer each run the less number of final runs  
produced, that means the less number of comparisons at each  
'mergesort' step).



IMPLEMENTATION ISSUES.
Where to place those heaps?
1) I think that both heaps could be arranged on the same  
'memtuples' array. That allows easily subsequent resizing those  
heaps according to their effective use or according to some  
heuristic, without reallocating memory.


How to arrange those heaps?
2a) It's convenient to arrange those heaps root to root. That is  
arranging those heaps with their roots toward the center of  
'memtuples' (in a way we can say they lay face to face, or root  
to root as said before) while their leaves lay towards the extreme  
indexes of the 'memtuples' array (that is the last leaf of one heap  
will lay at index 0, the last leaf of the other heap laying at  
index memtupsize-1. This arrangement prevents overlapping elements  
between those physical runs associated to the same current logical  
run.
PRO: once we qsort memtuples and divide it into 2 parts we already  
get those two heaps, no need to build them.

CONTRA: ???

2b) As in 2a) but arranging heaps leaf to leaf, that is their  
roots will lay at the extreme indexes of 'memtuples' while their  
leaves towards the center of the 'memtuples' array.
Or even start building heaps as soon as we get initial elements,  
instead of qsort the whole 'memtuples' array.

Any PRO/CONTRA compared to 2a)???

Current run numbers
I think I should duplicate the 'int currentRun' variable in the  
Tuplesortstate struct. I'll replace it with a 'int currentRunUP'  
and 'int currentRunDOWN' variables in order to distinguish those  
two physical runs associated to those 2 heaps. In this case I will  
give a run number (max{currentRunUP,currentRunDOWN} + 1) to  
elements not belonging to the current logical run. I suppose no  
need to touch 'long availMem' nor 'long allowedMem' variables nor  
any others.


Heap functions
I will duplicate all the heap management functions in order to  
adapt them to the kind of heap they should be applied to (for  
example, the tuplesort_heap_siftup function should be replaced with  
tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).


Merge Plan
This technique would use a sort of merge plan	to instruct  
mergesort on how to use those physical runs. Actually mergesort  
should consider at first odd runs before pair runs. That is,  
for example, mergesort should start merging runs with run number  
1,3,5,7,... and when run number X terminates start considering run  
number X+1. Obviously that doesn't need any merge plan, but I  
remember someone else (as Simon Riggs) was interested in sorting  
improvements so it could be a good thing to know if I should  
consider any conventions or paramethers in order to possibly create  
that merge plan.



DEVELOPMENT CONTEXT
I preferred to use the last stable release at the moment, that is  
8.2.



Any comment/suggestion/advice ?

Thanks for your attention and for your time.
Regards, Manolo.
_
Express yourself instantly with MSN Messenger! Download today it's  
FREE!


Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread mac_man2005

Well, sorry for hijacking... ummm how did I do that?

Anyway I'll thank you for giving a sign of life when I was almost loosing 
my hopes to get any kind of answer from -hackers.


I suppose the lack of answers was due to the way I wrote my mail. At that 
moment I supposed that at least someone reminded the 2WRS technique and 
possible benefits described into previous posts.
I think I was wrong, so I'll write it once again hoping meanwhile to get 
some suggestions on: HOWTO write a mail to which -hackers will give an 
answer :) hehehehe


Thanks for your attention.
Manolo.

--
From: Decibel! [EMAIL PROTECTED]
Sent: Tuesday, January 08, 2008 12:34 AM
To: Manolo _ [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Implementing Sorting Refinements

You'll get better response if you don't hijack threads. :) Also,  there's 
nothing in here that describes what the benefits of this  change are.


On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:



Hi to all.

This mail is aimed at asking some suggestion to face PostgreSQL  (PG) 
development to implement a refinement to PG source code. I'll  briefly 
summarize the idea of the 2-Way Replacement  Selection (2WRS) 
refinement, just in case. If you already remember  what 2WRS is, you can 
please jump directly to the IMPLEMENTATION  ISSUES part of this mail.



2WRS.
Refinement of the Replacement Selection (RS) technique currently  used by 
PG in src/backend/utils/sort/tuplesort.c .
The 2WRS uses two heaps instead of just one in order to create the 
current (logical) run. Here there are some fundamental points of  the 
2WRS technique:

- 'qsort' the initial unsorted 'memtuples' array
- divide the 'memtuples' array into two parts and each of those  will be 
managed as a heap
- one of the heaps will arrange its elements in ascending order,  while 
the other heap in descending order
- each heap will spill its current root in its corresponding run  (i.e.: 
we have a run per each of those two heaps), so we are  actually creating 
2 physical current runs
- those 2 current physical runs could theoretically be merged  into the 
same logical run, actually we can make 'mergesort' think  they do belong 
to the same physical run. That reduces the number of  comparisons 
'mergesort' has to do at each merge step (that means  less seek delay 
time on mass storage). We can also think the  average length of logical 
runs produced by 2WRS will probably be  greater than the average length 
produced by RS (again less seek  delay time: the longer each run the less 
number of final runs  produced, that means the less number of comparisons 
at each  'mergesort' step).



IMPLEMENTATION ISSUES.
Where to place those heaps?
1) I think that both heaps could be arranged on the same  'memtuples' 
array. That allows easily subsequent resizing those  heaps according to 
their effective use or according to some  heuristic, without reallocating 
memory.


How to arrange those heaps?
2a) It's convenient to arrange those heaps root to root. That is 
arranging those heaps with their roots toward the center of  'memtuples' 
(in a way we can say they lay face to face, or root  to root as said 
before) while their leaves lay towards the extreme  indexes of the 
'memtuples' array (that is the last leaf of one heap  will lay at index 
0, the last leaf of the other heap laying at  index memtupsize-1. This 
arrangement prevents overlapping elements  between those physical runs 
associated to the same current logical  run.
PRO: once we qsort memtuples and divide it into 2 parts we already  get 
those two heaps, no need to build them.

CONTRA: ???

2b) As in 2a) but arranging heaps leaf to leaf, that is their  roots 
will lay at the extreme indexes of 'memtuples' while their  leaves 
towards the center of the 'memtuples' array.
Or even start building heaps as soon as we get initial elements,  instead 
of qsort the whole 'memtuples' array.

Any PRO/CONTRA compared to 2a)???

Current run numbers
I think I should duplicate the 'int currentRun' variable in the 
Tuplesortstate struct. I'll replace it with a 'int currentRunUP'  and 
'int currentRunDOWN' variables in order to distinguish those  two 
physical runs associated to those 2 heaps. In this case I will  give a 
run number (max{currentRunUP,currentRunDOWN} + 1) to  elements not 
belonging to the current logical run. I suppose no  need to touch 'long 
availMem' nor 'long allowedMem' variables nor  any others.


Heap functions
I will duplicate all the heap management functions in order to  adapt 
them to the kind of heap they should be applied to (for  example, the 
tuplesort_heap_siftup function should be replaced with 
tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).


Merge Plan
This technique would use a sort of merge plan to instruct  mergesort on 
how to use those physical runs. Actually mergesort  should consider at 
first odd runs before pair runs. That is,  for example

Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread Guillaume Smet
On Jan 8, 2008 1:04 AM,  [EMAIL PROTECTED] wrote:
 Well, sorry for hijacking... ummm how did I do that?

 Anyway I'll thank you for giving a sign of life when I was almost loosing
 my hopes to get any kind of answer from -hackers.

Don't forget that we're just a few days/weeks of 8.3 release so the
attention is a bit focused on it at the moment (and I'm not speaking
of the security releases of today).

Don't feel disappointed about the lack of attention you're suffering
at the moment, just post your proposal again after 8.3 release,
explain what you plan to do and why and perhaps you'll have the time
to write a mock-up and get some numbers to prove your point before
that. That could help too.

Keep up the good work.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread Tomasz Ostrowski
On Tue, 08 Jan 2008, [EMAIL PROTECTED] wrote:

 Well, sorry for hijacking... ummm how did I do that?

You replied to a post instead of creating a new, unrelated e-mail. It
is different.

Just try to use threaded mode of your e-mail client and you'll get
the idea.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Implementing Sorting Refinements

2008-01-01 Thread Manolo _

Hi to all.

This mail is aimed at asking some suggestion to face PostgreSQL (PG) 
development to implement a refinement to PG source code. I'll briefly summarize 
the idea of the 2-Way Replacement Selection (2WRS) refinement, just in case. 
If you already remember what 2WRS is, you can please jump directly to the 
IMPLEMENTATION ISSUES part of this mail.


2WRS.
Refinement of the Replacement Selection (RS) technique currently used by PG in 
src/backend/utils/sort/tuplesort.c .
The 2WRS uses two heaps instead of just one in order to create the current 
(logical) run. Here there are some fundamental points of the 2WRS technique:
- 'qsort' the initial unsorted 'memtuples' array
- divide the 'memtuples' array into two parts and each of those will be managed 
as a heap
- one of the heaps will arrange its elements in ascending order, while the 
other heap in descending order
- each heap will spill its current root in its corresponding run (i.e.: we have 
a run per each of those two heaps), so we are actually creating 2 physical 
current runs
- those 2 current physical runs could theoretically be merged into the same 
logical run, actually we can make 'mergesort' think they do belong to the same 
physical run. That reduces the number of comparisons 'mergesort' has to do at 
each merge step (that means less seek delay time on mass storage). We can also 
think the average length of logical runs produced by 2WRS will probably be 
greater than the average length produced by RS (again less seek delay time: the 
longer each run the less number of final runs produced, that means the less 
number of comparisons at each 'mergesort' step).


IMPLEMENTATION ISSUES.
Where to place those heaps?
1) I think that both heaps could be arranged on the same 'memtuples' array. 
That allows easily subsequent resizing those heaps according to their effective 
use or according to some heuristic, without reallocating memory.

How to arrange those heaps?
2a) It's convenient to arrange those heaps root to root. That is arranging 
those heaps with their roots toward the center of 'memtuples' (in a way we can 
say they lay face to face, or root to root as said before) while their 
leaves lay towards the extreme indexes of the 'memtuples' array (that is the 
last leaf of one heap will lay at index 0, the last leaf of the other heap 
laying at index memtupsize-1. This arrangement prevents overlapping elements 
between those physical runs associated to the same current logical run.
PRO: once we qsort memtuples and divide it into 2 parts we already get those 
two heaps, no need to build them.
CONTRA: ???

2b) As in 2a) but arranging heaps leaf to leaf, that is their roots will lay 
at the extreme indexes of 'memtuples' while their leaves towards the center of 
the 'memtuples' array.
Or even start building heaps as soon as we get initial elements, instead of 
qsort the whole 'memtuples' array.
Any PRO/CONTRA compared to 2a)???

Current run numbers
I think I should duplicate the 'int currentRun' variable in the Tuplesortstate 
struct. I'll replace it with a 'int currentRunUP' and 'int currentRunDOWN' 
variables in order to distinguish those two physical runs associated to those 2 
heaps. In this case I will give a run number (max{currentRunUP,currentRunDOWN} 
+ 1) to elements not belonging to the current logical run. I suppose no need to 
touch 'long availMem' nor 'long allowedMem' variables nor any others.

Heap functions
I will duplicate all the heap management functions in order to adapt them to 
the kind of heap they should be applied to (for example, the 
tuplesort_heap_siftup function should be replaced with tuplesort_heap_siftupUP 
and tuplesort_heap_siftupDOWN functions).

Merge Plan
This technique would use a sort of merge plan to instruct mergesort on how to 
use those physical runs. Actually mergesort should consider at first odd runs 
before pair runs. That is, for example, mergesort should start merging runs 
with run number 1,3,5,7,... and when run number X terminates start considering 
run number X+1. Obviously that doesn't need any merge plan, but I remember 
someone else (as Simon Riggs) was interested in sorting improvements so it 
could be a good thing to know if I should consider any conventions or 
paramethers in order to possibly create that merge plan.


DEVELOPMENT CONTEXT
I preferred to use the last stable release at the moment, that is 8.2.


Any comment/suggestion/advice ?

Thanks for your attention and for your time.
Regards, Manolo.
_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org