Re: [HACKERS] [PATCHES] 2WRS [WIP]

2008-02-26 Thread mac_man2005

For the joy of all of you: that's the correct WIP patch.
At the moment it only tries to create runs uding two heaps. Hope you can 
help me with writing those runs on tapes.


I'd be very pleased to give you more details.

Thenks for your time.
Regards, Manolo.

--
From: Jaime Casanova [EMAIL PROTECTED]
Sent: Friday, February 22, 2008 5:30 AM
To: [EMAIL PROTECTED]
Cc: Decibel! [EMAIL PROTECTED]; Manolo _ [EMAIL PROTECTED]; 
David Fetter [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
pgsql-hackers@postgresql.org

Subject: Re: [HACKERS] [PATCHES] 2WRS [WIP]


On Thu, Feb 21, 2008 at 6:44 AM,  [EMAIL PROTECTED] wrote:

Hi.

That's the last release and refers to 8.3.0 and not to 8.2.5 as before. 
Hope

you can tell me if I created it correctly please.



no, it doesn't...


! /* GUC variables */
  #ifdef TRACE_SORT
  bool trace_sort = false;
  #endif
- #ifdef DEBUG_BOUNDED_SORT
- bool optimize_bounded_sort = true;
- #endif


it's seems you're removing something added in 8.3

--
regards,
Jaime Casanova

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning.
Richard Cook

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


tuplesort.patch
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Run positions on Tape

2008-02-02 Thread mac_man2005
Hi gurus.

I'm working on runs formation [ tuplesort.ctuplestore.c ]
Is there a way to know and store the address of the first and the last position 
of a run on a tape?

I would store the location of the first tuple while arranging the current run 
on the current destination tape.
On the other hand I would store the location of the last tuple of the same run 
just before writing the first tuple of the possible following run during run 
formation.

I tried to follow all of the write function calls  starting from 
LogicalTapeWrite() [tuplestore.c]. They pass through a lot of buffers before 
possibly writing to disk.

Should I follow those buffers or those infos are already stored somewhere else 
or can be retreived in a simples way?

Thanks for your atention.

Regards, Manolo.

[HACKERS] Backward reading

2008-02-01 Thread mac_man2005
PostgreSQL allows backward reading tuples writing the tuple's length after 
and before the tuple proper, in case a 'randomAccess' is requested.

Is there any example of backward reading tuples into PostgreSQL code?

Thanks.

Re: [HACKERS] Polyphase Merge

2008-01-22 Thread mac_man2005



--
From: Tom Lane [EMAIL PROTECTED]
Sent: Monday, January 21, 2008 10:13 PM
To: Sam Mason [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Polyphase Merge



I agree --- having to read the run back from external storage, only to
write it out again with no further useful work done on it, sounds like
a guaranteed loser.  To make this work you'll need some kind of ju-jitsu
rearrangement that logically puts the run where it needs to go without
physically moving any data.


I'm not going to write it back with no useful work on it. I should just 
write them in reverse order during run formation (ju-jitsu couldn't help me 
in this case) or read them in reverse order while merging (ju-jitsu may 
help... the point is that I'm not so good in ju-jitsu).


An idea could be managing a list of pointers to runs contained into tapes. 
Any comment?




regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Polyphase Merge

2008-01-21 Thread mac_man2005
I'm trying to refine the sorting module of tuplesort.c

During run creations I use two heaps instead of just one (yeah, it's still 
me... the one of the two heaps still trying to get some answer/help from 
-hackers)

Those two runs are built in a way such that if we would concatenate one of them 
to the other one red upside down, they will still form a run (recall that 
following Knuth, a run is a sorted sequence of data). There are a lot of 
possibility that with that refinement logical runs could be longer than 
ordinary runs built by the ordinary replacement selection. Remark we build 
runs: logical runs it's just a concept used to understand why we build runs 
that way.

ISSUES
a) how to distribute logical runs (that is both of its physical runs) into 
tapes?
b) one of the 2 physical runs of the logical run is to be red upside down while 
merging: how to efficiently do it?

Well, that's all for now.

Hope you can please give to me few seconds of you precious time. That would 
allow me to go on developing my refinement. Or at least tell me don't bother 
till the day next PostgreSQL release is out (when will it be released?) or 
don't bother anymore since nobody will ever answer to me.

Thanks for your attention.
Manolo.

[HACKERS] Using tapes on tuplesort.c

2008-01-14 Thread mac_man2005
Hi to all.

It seems that the current PostgreSQL implementation of the Replacement 
Selection (RS) algorithm [Knuth] changes a logical tape for each run built.
I'm trying to implement that refinement to RS using 2 heaps instead of just one 
(2Way RS). Recall each heap is aimed at building its corresponding physical 
run, both heap cooperate building its own physical run associated to the same 
logical run).

2Way RS stops building the current logical run just after stop building both 
physical runs associated to the current logical run.

My question: should I use/change tape for each physical run or for each logical 
run?

I know you'll be probably busy with issues on the new PostgreSQL release, so 
I'll thank you twice for your reply.

Regards, Manolo.

Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread mac_man2005

Well, sorry for hijacking... ummm how did I do that?

Anyway I'll thank you for giving a sign of life when I was almost loosing 
my hopes to get any kind of answer from -hackers.


I suppose the lack of answers was due to the way I wrote my mail. At that 
moment I supposed that at least someone reminded the 2WRS technique and 
possible benefits described into previous posts.
I think I was wrong, so I'll write it once again hoping meanwhile to get 
some suggestions on: HOWTO write a mail to which -hackers will give an 
answer :) hehehehe


Thanks for your attention.
Manolo.

--
From: Decibel! [EMAIL PROTECTED]
Sent: Tuesday, January 08, 2008 12:34 AM
To: Manolo _ [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Implementing Sorting Refinements

You'll get better response if you don't hijack threads. :) Also,  there's 
nothing in here that describes what the benefits of this  change are.


On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:



Hi to all.

This mail is aimed at asking some suggestion to face PostgreSQL  (PG) 
development to implement a refinement to PG source code. I'll  briefly 
summarize the idea of the 2-Way Replacement  Selection (2WRS) 
refinement, just in case. If you already remember  what 2WRS is, you can 
please jump directly to the IMPLEMENTATION  ISSUES part of this mail.



2WRS.
Refinement of the Replacement Selection (RS) technique currently  used by 
PG in src/backend/utils/sort/tuplesort.c .
The 2WRS uses two heaps instead of just one in order to create the 
current (logical) run. Here there are some fundamental points of  the 
2WRS technique:

- 'qsort' the initial unsorted 'memtuples' array
- divide the 'memtuples' array into two parts and each of those  will be 
managed as a heap
- one of the heaps will arrange its elements in ascending order,  while 
the other heap in descending order
- each heap will spill its current root in its corresponding run  (i.e.: 
we have a run per each of those two heaps), so we are  actually creating 
2 physical current runs
- those 2 current physical runs could theoretically be merged  into the 
same logical run, actually we can make 'mergesort' think  they do belong 
to the same physical run. That reduces the number of  comparisons 
'mergesort' has to do at each merge step (that means  less seek delay 
time on mass storage). We can also think the  average length of logical 
runs produced by 2WRS will probably be  greater than the average length 
produced by RS (again less seek  delay time: the longer each run the less 
number of final runs  produced, that means the less number of comparisons 
at each  'mergesort' step).



IMPLEMENTATION ISSUES.
Where to place those heaps?
1) I think that both heaps could be arranged on the same  'memtuples' 
array. That allows easily subsequent resizing those  heaps according to 
their effective use or according to some  heuristic, without reallocating 
memory.


How to arrange those heaps?
2a) It's convenient to arrange those heaps root to root. That is 
arranging those heaps with their roots toward the center of  'memtuples' 
(in a way we can say they lay face to face, or root  to root as said 
before) while their leaves lay towards the extreme  indexes of the 
'memtuples' array (that is the last leaf of one heap  will lay at index 
0, the last leaf of the other heap laying at  index memtupsize-1. This 
arrangement prevents overlapping elements  between those physical runs 
associated to the same current logical  run.
PRO: once we qsort memtuples and divide it into 2 parts we already  get 
those two heaps, no need to build them.

CONTRA: ???

2b) As in 2a) but arranging heaps leaf to leaf, that is their  roots 
will lay at the extreme indexes of 'memtuples' while their  leaves 
towards the center of the 'memtuples' array.
Or even start building heaps as soon as we get initial elements,  instead 
of qsort the whole 'memtuples' array.

Any PRO/CONTRA compared to 2a)???

Current run numbers
I think I should duplicate the 'int currentRun' variable in the 
Tuplesortstate struct. I'll replace it with a 'int currentRunUP'  and 
'int currentRunDOWN' variables in order to distinguish those  two 
physical runs associated to those 2 heaps. In this case I will  give a 
run number (max{currentRunUP,currentRunDOWN} + 1) to  elements not 
belonging to the current logical run. I suppose no  need to touch 'long 
availMem' nor 'long allowedMem' variables nor  any others.


Heap functions
I will duplicate all the heap management functions in order to  adapt 
them to the kind of heap they should be applied to (for  example, the 
tuplesort_heap_siftup function should be replaced with 
tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).


Merge Plan
This technique would use a sort of merge plan to instruct  mergesort on 
how to use those physical runs. Actually mergesort  should consider at 
first odd runs before pair runs. That is,  for example, 

Re: [HACKERS] compiling postgres in winxp

2007-12-01 Thread mac_man2005

Any Code::Blocks user?  http://www.codeblocks.org/
Cannot compile PG-8.2.5 with WinXP SP2 using Code::Blocks.

I created a new project including ALL the files decompressed from 
postgresql-8.2.5.tar.gz

and then just clicked on build. What's wrong?

Alternative ways to complie it with other IDE or any precise command line?

The error messages when building it are:

Project : Console application
Compiler : GNU GCC Compiler (called directly)
Directory : C:\Documents and Settings\manolo\Desktop\postgresql-8.2.5\

Switching to target: default
Compiling: contrib\adminpack\adminpack.c
contrib\adminpack\adminpack.c:15:22: postgres.h: No such file or directory
contrib\adminpack\adminpack.c:21:29: catalog/pg_type.h: No such file or 
directory

contrib\adminpack\adminpack.c:22:21: funcapi.h: No such file or directory
contrib\adminpack\adminpack.c:23:23: miscadmin.h: No such file or directory
contrib\adminpack\adminpack.c:24:34: postmaster/syslogger.h: No such file or 
directory

contrib\adminpack\adminpack.c:25:24: storage/fd.h: No such file or directory
contrib\adminpack\adminpack.c:26:28: utils/datetime.h: No such file or 
directory
contrib\adminpack\adminpack.c:40: warning: data definition has no type or 
storage class

contrib\adminpack\adminpack.c:42: error: syntax error before pg_file_write
contrib\adminpack\adminpack.c:42: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:42: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:43: error: syntax error before 
pg_file_rename
contrib\adminpack\adminpack.c:43: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:43: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:44: error: syntax error before 
pg_file_unlink
contrib\adminpack\adminpack.c:44: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:44: warning: data definition has no type or 
storage class

contrib\adminpack\adminpack.c:45: error: syntax error before pg_logdir_ls
contrib\adminpack\adminpack.c:45: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:45: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:47: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:47: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:48: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:48: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:49: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:49: warning: data definition has no type or 
storage class
contrib\adminpack\adminpack.c:50: warning: parameter names (without types) 
in function declaration
contrib\adminpack\adminpack.c:50: warning: data definition has no type or 
storage class

contrib\adminpack\adminpack.c:55: error: syntax error before DIR
contrib\adminpack\adminpack.c:55: warning: no semicolon at end of struct or 
union
contrib\adminpack\adminpack.c:56: warning: data definition has no type or 
storage class

contrib\adminpack\adminpack.c:69: error: syntax error before '*' token
contrib\adminpack\adminpack.c: In function `convert_and_check_filename':
contrib\adminpack\adminpack.c:71: error: `arg' undeclared (first use in this 
function)
contrib\adminpack\adminpack.c:71: error: (Each undeclared identifier is 
reported only once

contrib\adminpack\adminpack.c:71: error: for each function it appears in.)
contrib\adminpack\adminpack.c:71: error: `VARHDRSZ' undeclared (first use in 
this function)
contrib\adminpack\adminpack.c:72: warning: initialization makes pointer from 
integer without a cast
contrib\adminpack\adminpack.c:74: warning: passing arg 2 of `memcpy' makes 
pointer from integer without a cast
contrib\adminpack\adminpack.c:81: error: `ERROR' undeclared (first use in 
this function)
contrib\adminpack\adminpack.c:82: error: `ERRCODE_INSUFFICIENT_PRIVILEGE' 
undeclared (first use in this function)
contrib\adminpack\adminpack.c:88: error: `DataDir' undeclared (first use in 
this function)
contrib\adminpack\adminpack.c:91: error: `logAllowed' undeclared (first use 
in this function)
contrib\adminpack\adminpack.c:92: error: `Log_directory' undeclared (first 
use in this function)
contrib\adminpack\adminpack.c:99: error: `NULL' undeclared (first use in 
this function)

contrib\adminpack\adminpack.c: In function `requireSuperuser':
contrib\adminpack\adminpack.c:115: error: `ERROR' undeclared (first use in 
this function)
contrib\adminpack\adminpack.c:116: error: `ERRCODE_INSUFFICIENT_PRIVILEGE' 
undeclared (first use in this function)

contrib\adminpack\adminpack.c: At top level:

Re: [HACKERS] Replacement Selection

2007-12-01 Thread mac_man2005



in puttuple_common(), the transition from an internal to external sort is
performed at the bottom of the TSS_INITIAL case in the main switch 
statement.


The transition? Do we internal sort somewhere else and then external sort 
here in tuplesort.c?


The function dumptuples() heapifies the in-core tuples (divides the 
in-core tuples into initial runs and then advances the state to 
TSS_BUILDRUNS).


Cannot see where dumptuples() advances the state to TSS_BUILDRUNS.
I expected something like
   state-status = TSS_BUILDRUNS;
executed through dumptuples()





I recommend you run the code in the debugger on a external-sorting query: 
watch two or three tuples go into the heap and you'll get the idea.


The top of the heap is at state-memtuples[0] the heap goes down from 
there. New tuples are added there and the heap is adjusted (Using the 
tuplesort_heap_siftup() function).


-Tim



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Replacement Selection

2007-11-27 Thread mac_man2005

Hi to all.

It seems a previous mail of mine with following body hasn't been sent.
Sorry for possibly getting it twice.

Actually I have now modified that body, so it's worth to read it once again.

Thanks for your attention.
Regards.


PREVIOUS MAIL--
Well, the refinements are the followings:

Using 2 heaps instead of just one:
one heap creating a descending run and the
other one creating an ascending run.
Both associated to the same logical run.

Suppose we want the input elements to be finally sorted in an ascending
order. To do this we could QuickSort the first M initialization elements 
into RAM

and then divide it into 2 parts.
Suppose the first heap creates the following run:
10
9
8

And suppose the second heap creates the following run:
3
5
7

Those two runs can be seen as just one by mergesort... since they could be
physically merged into one single run: at first we could write the elements
3,5,7 and then the elements of the other run, red upside down.

Possible advantages:
Having two heaps of that kinds lets RS better adapt to local variations of 
the input trend.

This technique can be called Two Ways Replacement Selection (2WRS) just
because of those 2 heaps.
As an extreme example, we can say that having the input already sort in 
reverse order
no more leads us to the worst case: with 2WRS no matter the input is already 
sort
in ascending/descending order... in this case we'll produce just one run 
instead
of producing the maximum number of runs as in RS worst case (input in 
reverse order).
Moreover it lets us to grow the current run in 2 ways: just imagine we would 
output runs
in a regular file. With 2WRS this could be seen as start outputting elements 
from the middle
of such a regular file, the descending heap outputting elements from the 
middle upwards
while the ascending one outputting from the middle downward. This could 
imply getting
a smaller number of dead records (as I said in previous mails, a dear 
record is an element

that won't form part of the current run) and so having longer runs.

Others optimizations, for example, can be done with the virtual 
concatenation technique:
storing a cache of couples (first_element,last_element) for each created 
run. This
could be useful in case we can find 2 couples (first_element_1, 
last_element_1) and

(first_element_2, last_element_2) with   last_element_1 = first_element_2.
In this case, those runs too can be seen as belonging to the same logical 
run

(actually they are 2 RS different physical runs, or even 4 in 2WRS
but can be seen as just one by mergesort). Of course, once those 2 (or 4) 
runs are
logically merged into that only one, this last one in turn could be merged 
to other runs.


What does all that imply? Mergesort would actually consider a smaller number 
of runs
(since it should just work on logical runs). This means less jumps between 
runs on disk.


Now... to test those refinements I should integrate my code into
PostgreSQL... but it's not that easy for me...

Thanks for your attention.
PREVIOUS MAIL--  



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Replacement Selection

2007-11-27 Thread mac_man2005
Any comment about Two Ways Replacement Selection (two heaps instead of just 
one) ?



--
From: Simon Riggs [EMAIL PROTECTED]
Sent: Tuesday, November 27, 2007 1:03 PM
To: [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


On Tue, 2007-11-27 at 09:25 +0100, [EMAIL PROTECTED] wrote:


Others optimizations, for example, can be done with the virtual
concatenation technique:
storing a cache of couples (first_element,last_element) for each created
run. This
could be useful in case we can find 2 couples (first_element_1,
last_element_1) and
(first_element_2, last_element_2) with   last_element_1 = 
first_element_2.
In this case, those runs too can be seen as belonging to the same 
logical

run
(actually they are 2 RS different physical runs, or even 4 in 2WRS
but can be seen as just one by mergesort). Of course, once those 2 (or 4)
runs are
logically merged into that only one, this last one in turn could be 
merged

to other runs.

What does all that imply? Mergesort would actually consider a smaller 
number

of runs
(since it should just work on logical runs). This means less jumps 
between

runs on disk.


That's actually a refinement of an idea I've been working on for
optimizing sort. I'll post those separately.

--
 Simon Riggs
 2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005

Hi to all.

I'm new. I'd like to integrate my code into PostgreSQL. It's the 
implementation of some refinements of Replacement Selection algorithm used 
for External Sorting.
I have got some issue and preferibly I'd like to be supported by some 
developers that have something to do with it.


Who can I talk to?

Thanks for your attentions.
Good Luck!

Manolo. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005

Thanks for your support.

I downloaded the source code of the last stable version of PostgreSQL. Where 
can I find the part related to the External Sorting algorithm (supposed to 
be Replacement Selection)?

I mean, which is the file to be studied and/or modified and/or substituted?

Thanks for your attention.

--
From: Heikki Linnakangas [EMAIL PROTECTED]
Sent: Monday, November 26, 2007 1:35 PM
To: [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


[EMAIL PROTECTED] wrote:
I'm new. I'd like to integrate my code into PostgreSQL. It's the 
implementation of some refinements of Replacement Selection algorithm 
used for External Sorting.
I have got some issue and preferibly I'd like to be supported by some 
developers that have something to do with it.


Who can I talk to?


This mailing list is the right place to discuss that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005

Ok guys!
Thanks for your help.

Unfortunately I'm lost into the code... any good soul helping me to 
understand what should be the precise part to be modified?


Thanks for your time!

--
From: Heikki Linnakangas [EMAIL PROTECTED]
Sent: Monday, November 26, 2007 2:34 PM
To: [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


[EMAIL PROTECTED] wrote:
I downloaded the source code of the last stable version of PostgreSQL. 
Where can I find the part related to the External Sorting algorithm 
(supposed to be Replacement Selection)?
I mean, which is the file to be studied and/or modified and/or 
substituted?


In src/backend/utils/sort/tuplesort.c. The comments at the top of that 
file is a good place to start.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Fw: [HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005

Thanks for your advice.

The developement of this integration is part of my final project. And fo
course my initial bibliografy includes the Knuth reference as you can see

1. Vladimir Estivill-Castro and Derick Wood. A survey of adaptive sorting
algorithms. ACM Computing Surveys, 24(4):441{476, 1992.

2. Donald E. Knuth. The art of computer programming, volume 3: sorting and
 searching. Addison-Wesley, Reading, 2nd edition, 1998.

3. P. Larson and G. Graefe. Memory management during run generation in
external sorting. In ACM, editor, SIGMOD98, pages 472{483, 1998.

4. Per-Ake Larson. External sorting: Run formation revisited. IEEE
Transactions on Knowledge and Data Engineering, 15(4):961{972, 2003.

5. Je®rey Scott Vitter and David A. Hutchinson. Distribution sort with
randomized cycling. pages 77-86.


--
From: Tom Lane [EMAIL PROTECTED]
Sent: Monday, November 26, 2007 6:00 PM
To: Alvaro Herrera [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


Alvaro Herrera [EMAIL PROTECTED] writes:

[EMAIL PROTECTED] wrote:

Unfortunately I'm lost into the code... any good soul helping me to
understand what should be the precise part to be modified?



I think you should print the file and read it several times until you
understand what's going on.  Then you can start thinking where and how
to modify it.


Also, go find a copy of Knuth volume 3, because a whole lot of the
comments assume you've read Knuth's discussion of external sorting.

regards, tom lane



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005

Sorry.

I'm trying to integrate my code into PostgreSQL. At the moment I have got my 
working code, with my own main() etc etc.
The code is supposed to perform run generation during external sorting. 
That's all, my code won't do any mergesort. Just run generation.


I'm studing the code and I don't know where to put my code into. Which part 
I need to substitute and which other are absolutely untouchables.
I admit I'm not an excellent programmer. I've always been writing my own 
codes, simple codes. Now I have got some ideas that can possibly help 
postgreSQL to get better. And for the first time I'm to integrate code into 
others code. I say it just to apologize in case some things that could be 
obvious for someone else, maybe are not for me.


Anyway... back to work.
My code has the following structure.

1) Generates a random input stream to sort.
As for this part, i just generate an integer input stream, not a stream of 
db records. I talk about stream because I'm in a general case in which the 
input source can be unknown and we cannot even know how much elements to 
sort


2)Fill the available memory with the first M elements from stream. They will 
be arranged into an heap structure.


3) Start run generation. As for this phase, I see PostgreSQL code (as Knuth 
algorithm) marks elements belonging to runs in otder to know which run they 
belong to and to know when the current heap has finished building the 
current run. I don't memorize this kind of info. I just output from heap to 
run all of the elements going into the current run. The elements supposed to 
go into the next run (I call them dead records) are still stored into main 
memory, but as leaves of the heap. This implies reducing the heap size and 
so heapifying a smaller number of elements each time I get a dead record 
(it's not necessary to sort dead records). When the heap size is zero a new 
run is created heapifying all the dead records currently present into main 
memory.


I haven't seen something similar into tuplesort.c, apparently no heapify is 
called no new run created and stuff like this.
Do you see any parallelism between PostgreSQL code with what I said in the 
previous points?


Thanks for your attention.

--
From: Heikki Linnakangas [EMAIL PROTECTED]
Sent: Monday, November 26, 2007 5:42 PM
To: [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


[EMAIL PROTECTED] wrote:
Unfortunately I'm lost into the code... any good soul helping me to 
understand what should be the precise part to be modified?


You haven't given any details on what you're trying to do. What are you 
trying to do?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Replacement Selection

2007-11-26 Thread mac_man2005
I must precise that it's not the improvement. Other more complex algorithms 
correspond to the refinements, but at the moment I just want to know which 
part of PostgreSQL code does what. I also implemented Replacement Selection 
(RS) so if I'm able to integrate my RS I hope I would be able to integrate 
the others too.


Anyway, even in my RS implementation a longer run is created. The first M 
initialization elements will surely form part of the current run. M is the 
memory size so at least a run sized M will be created. After initialization, 
the elements are not suddenly output, but an element from heap is output 
into run as soon as I get an element from stream. In other words, for each 
element from stream, the root element of the heap is output, and the input 
element takes the root place into the heap. If that element is a good 
record I just heapify (since the element will be placed at the now free 
root place). If that input element is a dead record I swap it with the last 
leaf and reduce the heap size.




--
From: Tom Lane [EMAIL PROTECTED]
Sent: Monday, November 26, 2007 7:31 PM
To: [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replacement Selection


[EMAIL PROTECTED] writes:
3) Start run generation. As for this phase, I see PostgreSQL code (as 
Knuth
algorithm) marks elements belonging to runs in otder to know which run 
they

belong to and to know when the current heap has finished building the
current run. I don't memorize this kind of info. I just output from heap 
to
run all of the elements going into the current run. The elements supposed 
to
go into the next run (I call them dead records) are still stored into 
main
memory, but as leaves of the heap. This implies reducing the heap size 
and

so heapifying a smaller number of elements each time I get a dead record
(it's not necessary to sort dead records). When the heap size is zero a 
new
run is created heapifying all the dead records currently present into 
main

memory.


Why would this be an improvement over Knuth?  AFAICS you can't generate
longer runs this way, and it's not saving any time --- in fact it's
costing time, because re-heapifying adds a lot of new comparisons.

regards, tom lane



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match